4  7~? 


FOUR  LECTURES  ON  ALGEBRAIC  LINGUISTICS 
AND  MACHINE  TRANSLATION 


A  revlaed  version  of  a  series  of  lectures  given  in  July,  1962  before 
^NATO  Advanced  Summer  Institute  on  Automatic  Translation  of 
Languages  in  Venice,  Italy. 


by 

Yehoshua  Bar-Hillel 
Professor  of  Logic  and  Philosophy 
of  Science 

Hebrew  University,  Jerusalem,  Israel 


ASTIA 

LlljisiSisinnE 

TISIA 


January,  1963 


The  work  upon  which  these  lectures  are  based  was  supported  by  the 
United  States  Office  of  Naval  Research,  Information  Systems  Branch 
under  Contract  N  62558-2214 


FOUR  LECTURES  ON  ALGEBRAIC  LINGUISTICS 
AND  MACHINE  TRANSLATION 


A  revised  version  of  a  series  of  lectures  given  in  July,  1962 
before  a  NATO  Advanced  Summer  Institute  on  Automatic 
Translation  of  Languages  in  Venice,  Italy 


by 

Yehoshua  Bar-Hillel 

Professor  of  Logic  and  Philosophy  of  Science 
Hebrew  University,  Jerusalem  Israel 


January,  1963 


The  work  upon  which  these  lectures  are  based  was  supported 
by  the  U.  S.  Office  of  Naval  Research  ,  Information  Systems 
Branch,  under  Contract  N  62558-  2214. 


TABLE  OF  CONTENTS 


W 


FIRST  LECTURES  The  role  of  grammatical  models  in  machine  translation 
SECOND  LECTURE:  Syntactic  complexity 

THIRD  LECTURE:  Language  and  speech;  theory  vs.  observation  in  linguistics 
FOURTH  IfCTURE:  Why  machines  won't  learn  to  translate  well 

REFERENCES 


LECTURE  1:  THE  ROLE  OP  GRAMMATICAL  MODELS  IN  MACHINE  TRANSLATION 


Linguistics,  as  every  other  empirical  science,  is  a  complex  mixture 
of  theory  and  observation.  The  precise  nature  of  this  mixture  is  still  not 
too  well  understood,  and  in  this  respect  the  difference  between  linguistics 
and,  say,  physics  is  probably  at  most  one  of  degree.  This  lack  of  methodo¬ 
logical  insight  has  often  led  to  futile  disputes  between  linguists  and  other 
scientists  dealing  with  language,  such  as  psychologists,  logicians,  or 
communication  theoreticians,  as  well  as  among  linguists  themselves. 

Recently,  however,  considerable  progress  has  been  made  in  the  under¬ 
standing  of  the  function  of  theory  in  linguistics,  as  a  result  of  which 
theoretical  linguistics  has  come  into  full-fledged  existence.  Interestingly 
enough,  the  present  customary  name  for  this  new  subdiscipline  is  rather 
mathematical  linguistics.  This  is  slightly  unfortunate!  though  the 
adjective  'mathematical 1  is  quite  all  right  if  'mathematical'  is  under¬ 
stood  in  the  sense  of  'theory  of  formal  systems',  which  is  indeed  one  of 
its  many  legitimate  senses,  it  is  misleading  inasmuch  as  it  is  still 
associated,  at  least  among  the  non-specialists,  including  the  bulk  of  the 
linguists  themselves,  with  numbers  and  quantitative  treatment.  That 
subdisciplina  of  linguistics,  however,  which  deals  with  numbers  and 
statistics  should  better  be  called  statistical  linguistics  and  rather 
carefully  be  kept  apart  from  mathematical  linguistics  qua  theoretical  ling¬ 
uistics.  Should  one  prefer  to  regard  'mathematical  linguistics'  as  a  term 
for  a  genus  of  which  statistical  linguistics  is  a  species,  then  the  other 
species  should  perhaps  be  named  algebraic  linguistics. 

After  this  terminological  aside  which,  I  think,  was  not  superfluous, 
let  us  briefly  sketch  the  background  and  development  of  algebraic  lingui¬ 
stics.  In  the  hands  of  such  authors  as  Harris  [l]  and  Hockett  [2]  in  the 
United  States,  Qjelmslev  [3]  and  Uldall  [4]  in  Europe,  structural  lingui¬ 
stics  became  more  and  more  conscious  of  the  chasm  between  theory  and 
observation,  and  linguistic  theory  deliberately  got  an  algebraic  look.  At 
the  same  time,  Carnap  [3]  and  the  Polish  logicians,  especially  Ajdukiewicz 
[6],  developed  the  logical  syntax  of  language  which  was,  however,  too  much 


-  2  - 


preoocupied  with  rules  of  deduction,  and  too  little  with  rules  of  formation, 
to  exert  a  great  influence  on  current  linguistics.  Finally,  Post  [7] 
succeeded  in  formally  assimilating  rules  of  formation  to  rules  of  deduction, 
thereby  paving  the  way  of  the  application  of  the  recently  -developed  powerful 
theory  of  recursive  functions,  a  branch  of  mathematical  logic,  to  all 
ordinary  languages  viewed  as  combinatorial  systems  [8],  while  Curry  [9] 
became  more  and  more  aware  of  the  implications  of  combinatorial  logic  to 
theoretical  linguistics.  It  is,  though,  perhaps  not  too  surprising  that  the 
ideas  of  Post  and  Curry  should  be  no  better  known  to  professional  linguists 
than  those  of  Carnap  and  Ajdukiewlcz. 

It  seems  that  a  major  a hangs  in  the  peaceful  but  uninspiring  co¬ 
existence  of  structural  linguists  and  syntax-oriented  logicians  came  along 
when  the  idea  of  mechanizing  the  determination  of  syntactic  structure  began 
to  take  hold  of  the  imagination  of  various  authors.  Though  this  idea  was 
originally  but  a  natural  outcome  of  the  professional  preoccupation  of  a 
handful  of  linguists  and  logicians,  it  made  an  almost  sensational  break¬ 
through  in  the  early  fifties  when  it  became  connected  with,  and  a  corner¬ 
stone  of,  automatic  translation  between  natural  languages.  At  one  stroke, 
structural  linguistics  had  become  useful.  Just  as  mathematical  logic, 
regarded  for  years  as  the  most  abstract  and  abstruse  scientific  discipline, 
became  overnight  an  essential  tool  for  the  designer  and  programmer  of 
electronic  digital  computers,  so  structural  linguistics,  regarded  for  years 
as  the  most  abstract  and  speculative  branch  of  linguistics,  is  now  con¬ 
sidered  by  many  a  must  for  the  designer  of  automatic  translation  routines. 
The  impact  of  this  development  was  at  times  revolutionary  and  dramatic.  In 
Soviet  Russia,  for  instance,  structural  linguistics  had,  before  1934, 
unfailingly  been  condemned  as  idealistic,  bourgeois  and  formalistio.  However, 
when  the  Russian  government  awakened  from  its  dogmatic  al  umber  to  the  tune 
of  the  Georgetown  University  demonstration  of  machine  translation  in  January 
1934,  structural  linguistics  became  within  a  few  weeks  a  discipline  of  high 
prestige  and  priority.  And  just  as  mathematical  logic  has  its  special 


offspring  to  deal  with  digital  oomputers,  i.e.,  the  theory  of  automata.  so 
structuxml  linguistics  has  its  special  offspring  to  deal  with  mechanical 
structure  determination ,  i.e. ,  algebraic  linguistics,  also  called,  when 
this  application  is  particularly  stressed,  computational  linguistics  or 
linguistics .  As  a  final  surprise,  it  has  recently  turned  out  that 
these  two  disciplines,  automata  theory  and  algebraic  linguistics,  exhibit 
extremely  close  relationships  which  at  times  amount  to  practical  identity. 

To  complete  this  historical  sketch:  around  1934,  Chomsky,  influenced 
by,  and  in  constant  exchange  of  ideas  with  Harris,  started  his  investi¬ 
gations  into  a  new  typology  of  linguistic  structures.  In  a  series  of  pub¬ 
lications,  of  which  the  booklet  Syntactic  Structures  [lo]  is  the  best 
known,  but  also  the  least  technical,  he  defined  and  constantly  refined 
a  complex  hierarchy  of  such  structures,  meant  to  serve  as  models  for 
natural  languages  with  varying  degrees  of  adequacy.  Though  models  for 
the  treatment  of  linguistic  structures  were  also  developed  by  many  other 
authors,  Chomsky's  publications  exhibited  a  degree  of  rigor  and  testability 
which  was  unheard  of  before  that  in  the  linguistic  literature  and  therefore 
quickly  became  for  many  a  standard  of  comparison  for  other  contributions. 

1  shall  now  turn  to  a  presentation  of  the  work  of  the  Jerusalem  group 
in  linguistic  model  theory  before  I  continue  with  the  description  and 
evaluation  of  some  other  contributions  to  this  field. 

In  1937,  while  working  on  a  master's  thesis  on  the  logical  antinomies, 

I  oame  across  AJdukiewicz ' s  work  [6].  Fourteen  years  later,  having  become 
acquainted  in  the  meantime  with  structural  linguistics,  and  especially  with 
the  work  of  Harris  [l],  and  instigated  by  my  work  at  that  time  on  machine 

translation,  I  realised  the  importance  of  Ajdukiewios's  approach  for  the 

» 

mechanization  of  the  determination  of  syntactic  structure,  and  published 
an  adaptation  of  A jdukiewioz ' s  ideas  [ll]. 

The  basic  heuristic  concept  behind  the  type  of  grammar  proposed  in  this 
paper  ,  and  later  further  developed  by  Lambek  [l2,  13,  14],  myself  [l5]  and 


-  4  - 


others,  is  the  following:  the  grammar  was  meant  to  be  a  recognition 
(identification  or  operational)  grammar,  i.e.,  a  device  by  which  the 
syntactic  structure,  and  in  particular  the  sentencehood ,  of  a  given  ntring 
of  elements  of  a  given  language  could  be  determined.  This  determination  had  to 
be  formal,  i.e.,  dependent  exclusively  on  the  shape  and  order  of  the  elements, 
anl  preferably  effective,  i.e.,  leading  after  a  finite  number  of  steps  to  the 
decision  as  to  the  structure,  or  structures,  of  the  given  string.  This  aim  was 
to  be  achieved  by  assuming  that  each  of  the  finitely  many  elements  of  the 
given  natural  language  had  finitely  many  syntactic  functions,  by  developing  a 
suitable  notation  for  these  syntactical  functions  (or  categories,  as  we  became 
used  to  calling  them,  in  the  tradition  of  Aristotle,  Husserl,  and  Lesniewsld.) , 
and  by  designing  an  algorithm  operating  on  this  notation. 

More  specifically,  the  assumption  was  investigated  that  natural  languages 
have  what  is  known  to  linguists  as  a  contiguous  constituent  structure, 

i.e.,  that  every  sentence  can  be  parsed,  according  to  finitely  many  rules,  into 
two  or  more  contiguous  constituents,  either  of  which  is  already  a  final  cons¬ 
tituent  or  else  is  itself  parsible  into  two  or  more  immediate  constituents, 
etc.  This  parsing  was  not  supposed  to  be  necessarily  unique.  Syntactically 
ambiguous  sentences  allowed  for  two  or  more  different  parsings.  Kx&mples  should 
not  be  necessary  here. 

The  variation  introduced  by  AjdukLevicz  into  this  conception  of  linguistio 
structure,  well  known  in  a  crude  form  already  to  elementary  school  students, 
was  to  regard  the  combination  of  constituents  into  constitutes  (or  T^fltMUta) 
not  a  concatenation  inter  pares  but  rather  as  the  result  of  the  operation  of 
one  of  the  constituents  (the  governor,  in  some  terminologies)  upon  the  others 
(the  governed  or  dependent  units).  The  specific  form  which  the  approach  took 
with  AJdukiewiez  was  to  assign  to  each  word  vor  other  appropriate  element)  of 
a  given  natural  language  a  finite  number  of  fundamental  and/or  operator 
categories  and  to  employ  an  extremely  simple  set  of  rules  operating  upon 
these  categories,  so-called  "cancellation^  rules. 


-  o  - 


Juat  for  the  sake  of  illustration,  let  me  give  here  the  definition  of 
bidirectional  categorial  gi-miar.  in  a  slight  variation  of  the  one  presented 
in  a  recent  publication  of  our  group  [l6].  We  define  it  as  an  ordered  quintuple 
«  V,C,£,R,ct  ,  where  V  is  a  finite  set  of  elements  (the  vocabulary) .  C  is 
the  closure  of  a  finite  set  of  fundamental  categories,  say  Y  ,  ...  .Y^,  under 
the  operations  of  right  and  left  diagonalization  (i.e.,  whenever  a  and  0  are 
categories,  fV'fO  end  [a\  0]  are  categories),  £  is  a  distinguished  category 
of  C  (the  category  of  sentences) .  R  is  the  set  of  the  two  cancellation  rules 
Yj  — t  9^ ,  and  ^ ’*  ^  and  CL  is  a  function  from  V  to 

finite  sets  of  C  (the  function). 

We  say  that  a  category  sequence  a  directly  cancels  to  0,  if  0  results 
from  a  by  one  application  of  one  of  the  cancellation  rules,  and  that  g  cancels 
to  0,  if  0  results  from  a  by  finitely  many  applications  of  these  rules  (more 
exactly,  if  there  exist  category  sequences  f  ...,  j  n  such  that 

o»  j  0“J'Q,  and  directly  cancels  to  for  i-1,  ...,  n-l). 

A  string  x*A^  .  .  .  over  V  is  defined  to  be  a  sentence  if,  and  only 
if,  at  least  one  of  the  category  sequences  assigned  to  x  by  Ct  cancels  to  £. 
The  set  of  all  sentences  is  then  the  language  determined  (or  represented)  bv 
the  given  categorial  grammar.  A  language  representable  by  such  a  grammar  is  a 
categorial  language. 

In  addition  to  bidirectional  categorial  grammars,  we  also  dealt  with 
unidirectional  categorial  gwui »n»-  employing  either  right  or  left  diagonali¬ 
zation  only  for  the  formation  of  categories,  and  more  specifically  with  what 
we  called  restricted  c;ite/'orial  gramma ra.  whose  set  of  categories  consists  only 
of  the  (finitely  many)  fundamental  categories  Y^,  and  the  operator  categories 
[\\  Tj]  and  [Yt\  [Y.\  Yk]]  (or,  alternatively,  [Y^]  and  fc/Clj/y]). 

One  of  the  results  obtained  by  Oaifman  in  1959  was  that  every  language 
determinable  by  a  bidirectional  ca^rt"1  can  also  be  determined  bv  a 

unidirectional  grammar  and  even  by  a  restricted  categorial  grammar. 


A  heuristically  (though  not  essentially)  different  approach  to  the 


-  6  - 


formalisation  of  immediate  -  oonatituant  grammars  was  taken  by  Chomsky, 
within  the  framework  of  his  general  typology.  He  looked  upon  a  grammar  as  a 
device,  or  a  system  of  rules,  for  generating  (or  recursively  enumerating)  the 
class  of  all  sentences.  In  particular,  a  context-free  phrase  structure 
grammar,  a  CF  grammar  for  short,  may  be  defined,  again  in  slight  variation 
from  Chomsky's  original  definition,  as  an  ordered  quadruple  <7,  T,  S,  P  >  , 
where  7  is  the  (total)  vocabulary.  T  (the  terminal  vocabulary)  is  a  subset 
of  7,  S  (the  initial  symbol)  is  a  distinguished  element  of  7-T  (the  auxiliary 
vocabulary),  and  P  is  a  finite  set  of  production  rules  of  the  form  X  ->x, 
where  Xe.7  -  T  and  x  is  a  string  over  7. 

We  say  that  a  string  x  directly  generates  y,  if  y  results  from  x 
by  one  application  of  one  of  the  production  rules,  and  that  x  generates  y, 
if  y  results  from  x  by  finitely  many  applications  of  these  rules  (more 
exactly,  if  there  exist  sequences  of  strings  s^,  s2,  ...,  sn  such  that 

Ml*  y"*n  an<^  *i  directly  generates  »i+1,  for  •••»  n_1)* 

A  string  over  T  is  defined  to  be  a  sentence  if  it  is  generated  by  S. 
The  set  of  all  sentences  is  the  language  determined  (or  represented)  bv  the 
fllYfTI  fTMBT. 

My  conjecture  that  the  classes  of  CP  languages  and  bidirectional  oate- 
gorial  languages  are  identical  -  in  other  words,  that  for  each  CF  language 
there  exists  a  weakly  equivalent  bidirectional  categorlal  language  and  vice 
vena  -  was  proved  in  1959  by  Qeifnan  [lfi],  by  a  method  that  is  too  qomplex 
to  be  described  hen.  He  proved,  as  a  matter  of  fact,  slightly  son,  namely 
that  for  each  CF  grammar  then  exists  a  weakly  equivalent  restricted  oate- 
gorial  grammar  and  vice  vena.  The  equivalent  npnsentation  can  in  all  oases 
be  effectively  obtained  from  the  original  npnsentation. 

This  equivalence  proof  was  preceded  by  another  in  whioh  it  was  shown  that 
the  notion  of  a  finite  state  gr»""yr-  FS  grammar  for  short,  occupying  the 
lowest  position  in  Chomsky's  hienrchy  of  generation  gmmman,  was  equivalent 
to  that  of  a  finite  automaton,  in  the  sense  of  Rabin  and  Scott  [l7],  whioh  can 


-  7  - 


be  viewed  as  another  kind  of  recognition  grammar.  The  proof  itself  was  rather 
straightforward  and  almost  trivial,  relying  mainly  on  the  equivalence  of 
deterministic  and  non-deterministic  finite  automata,  shown  by  Rabin  and  Scott. 

It  has  been  adequately  described  in  a  recently  published  paper  [ib]. 

Chomsky  had  already  shown  that  the  IS  languages  formed  a  proper  subclass 
of  the  Cf  languages.  Ve  have  recently  been  able  to  prove  [19]  that  the  problem 
whether  a  CF  language  is  also  representable  by  a  IS  grammar  -  a  problem  which 
has  considerable  linguistic  importance  -  is  recursively  unsolvable  The  method 
used  was  reduction  to  Post's  correspondence  problem,  a  f  amour  problem  .n 
mathematical  logic  which  was  shown  by  Post  [20]  to  be  recursively  unsolvable. 

Among  other  results  recently  obtained,  let  me  only  mention  the  following! 
whereas  VS  languages  are,  in  view  of  the  equivalence  of  IS  grammars  to  finite 
automata  and  well-known  results  of  Kleen*  [2l]  and  others,  closed  under  various 
Boolean  and  other  operations,  CF  languages  whose  vocabulary  contains  at  least 
two  symbols  are  not  closed  under  complementation  and  intersection,  though  closed 
under  various  other  operations.  The  union  of  two  CF  languages  is  amain  a  CF 
language,  and  a  representation  can  be  effectively  constructed  from  the  given 
representation.  T faff  jajlBMflgft  gf  *  Cf  1L&  S3. 

Cadecidablf  are  such  problems  as  the  equivalence  problem  oetween  two  CF 

fog.  Si  tilf 

problem  of  dib.lolntedness  of  such  languages,  etc.  In  this  connection,  interesting 

iffrtiwftew  tea  amJajBttiLAilatB.  St  mimvi  "Tl  tw-Jyj&Xliilli 

automata,  -*n  defined  and  treated  by,  Babin  and  Scott,  for  which  the  disjointed¬ 
ness  problem  .<f  the  sets  of  acceptable  tapes  is  similarly  unsolvable. 

A  particular  proper  subset  of  the  CF  languages,  apparently  of  greater 
importance  for  the  treatment  of  programming  languages,  such  as  ALGOL,  than  for 
natural  languages,  is  the  set  of  so-called  sequential  languages,  studied  in 
particular  by  Ginsburg  1'22,  25]  and  Shamir  [24].  I  have  no  time  for  more  than 
just  this  remark. 


-  8  - 


In  a  somewhat  different  approaoh,  cloaely  related  to  the  olaseical 
notions  of  government  and  syntagmata,  the  notions  of  dependency  graamre 
and  prb.leotive  grammars  have  been  developed  by  Rays  [25] ,  Lecerf  [26],  and 
others,  including  some  Russian  authors,  utilizing  ideas  most  fully  presented 
in  Teaniere's  posthumous  book  [27],  and  are  thought  to  be  of  particular 
importance  for. machine  translation.  However,  it  has  not  been  too  difficult 
to  guess,  and  has  indeed  been  rigorously  proven  by  Qaifman  [28],  that  these 
grammars,  which  are  being  discussed  in  other  lectures  presented  in  this 
Institute,  are  equivalent  to  CV  grammars  in  a  certain  sense,  which  is  some- 
‘  what  stronger  than  the  one  used  above,  but  that  this  is  not  necessarily  so 
with  regard  to  what  might  be  called  natural  strong  equivalence.  More  precisely, 
whereas  for  every  dependency  grammar  there  exists,  and  can  be  effectively 
constructed,  a  CF  grammar  naturally  and  strongly  equivalent  to  it,  this  is  not 
necessarily  the  case  in  the  opposite  direction,  not  if  the  CF  grammar  is  of 
Infinite  degree.  Let  me  add  that  the  dependency  grammars  are  very  closely 
related  to  a  type  of  categorial  grammars  which  I  discussed  in  earlier  pub¬ 
lications  [ll]  but  later  on  replaced  by  grammars  of  a  seemingly  simpler 
structure.  In  the  original  categorial  grammars,  I  did  consider  categories  of 
the  form  W  fZ"-  f n*  «***»  “*  ■«*  Sj  being  either 

fundamental  or  operator  categories  themselves,  with  a  corresponding  cancellation 
rule.  It  should  be  rather  obvious  how  to  transform  a  dependency  mamma  1  into 
a  categorial  grammar  of  this  particular  type.  These  grammars  are  equivalent  to 
graamrs  in  which  all  categories  have  the  form  p  \  a/j*  where  a,  P«  and 
S'  are  fundamental  categories  and  where  0  and  f  may  be  empty  (in  which  case 
the  corresponding  diagonal  will  be  omitted,  too,  from  the  symbol).  Finally,  in 
view  of  Oaifman's  theorem  mentioned  above,  these  grammars  in  their  turn  are 
equivalent  to  gramiwrs  all  of  whose  oategories  are  of  the  form  a/Pf  (or 
fP\a),  with  the  same  conditions.  I  think  that  these  remarks  (strongly 
connected  with  considerations  of  combinatory  logic  [9])  should  definitely 
settle  the  question  of  the  exact  formal  status  of  the  dependency  grammars  and 
their  like.  One  side  result  is  that  dependency  grammars  are  weakly  reducible 


-  9  - 


to  binary  dependency  grammars.  i.e.,  grammar®  in  which  each  unit  govern® 
at  most  two  other  units,  nil®  result,  I  presume,  is  not  particularly  sur¬ 
prising,  especially  if  we  remember  that  the  equivalence  proven  will  in  general 
not  be  a  natural  one. 

Still  another  class  of  grammars,  sometimes  [29]  called  push-down  store 
grammars  and  originating,  though  not  in  a  very  precise  form,  with  Yngve  [30, 
3l],  has  recently  been  A  own  by  Chomsky  to  be  once  more  equivalent  to  CP 
grammars,  again  to  nobody's  particular  surprise.  Since  push-down  stores  are 
regarded  by  many  workers  in  the  fields  of  NT  and  programming  languages  as 
particularly  useful  da' Ices  for  the  mechanical  determination  of  syntactic 
structure  of  sentences  belonging  to  natural  and  programming  languages,  respec¬ 
tively,  this  result  should  be  helpful  in  clarifying  the  exact  scope  of  those 
schemes  of  syntactic  analysis  which  are  based  on  these  devices. 

Of  theoretically  greater  importance  is  the  fact  that  push-down  store 
grammars  form  a  proper  sub-set  of  linear  bounded  automata,  one  of  the  many 
classes  of  automata  lying  between  Turing  machines  and  finite  automata  which 
have  recently  been  investigated  by  many  authors,  due  to  the  fact  that  Turing 
machines  are  too  idealised  to  be  of  much  direct  applicability,  whereas  finite 
automata  are  too  restricted  for  this  purpose. 

The  investigation  of  these  automata,  initiated  by  Hyhill  [32],  is,  however, 
still  in  its  infancy,  similar  to  that  of  many  other  classes  of  automata  reported 
by  HcNaughton  in  his  excellent  review  [33] •  Still  more  in  the  dark  is  the  ling¬ 
uistic  relevance  of  all  these  models  though,  judging  from  admittedly  limited 
experience,  almost  every  single  one  of  them  will  sooner  or  later  be  shown  to 
have  such  relevance. 

To  wind  up  this  discussion,  let  me  only  mention  that  during  the  last  few 
years  various  classes  of  grammars  whose  potency  is  intermediate  between  IB  and 
CF  grammars  have  beqn  investigated.  These  intermediate  grammars  will  probably 
turn  out  to  be  of  greater  importance  for  the  study  of  grammars  of  pro  gleaning 
and  other  artificial  formalized  languages  than  for  natural  languages.  In 


10  - 


addition  to  the  sequential  grammars  mentioned  before,  let  me  now  mention  the 
linear  and  metalinear  grammars  studied  by  Chomsky. 

It  might  be  useful  to  present,  at  this  stage,  a  picture  of  the  various 
grammars  discussed  in  the  present  lecture,  together  with  the  two  important 
classes  of  transformational  and  context-sensitive  phrase  structure  grammars 
(which  I  could  not  discuss,  for  lack  of  time)  in  the  form  of  a  directed  graph 
based  on  the  (partial)  ordering  relation  Deteraine-a-more-extenaive-claan-of- 
languages-than  (the  staggered  lines  Indicating  that  the  exact  relationship  has 
not  yet  been  fully  determined)! 


11 


The  last  two  questions  I  would  now  liks  to  discuss  ars  the  following: 

(l)  In  risw  of  the  faot  that  so  many  models  of  linguistic  structure  have 
turned  out  to  be  (weekly)  equivalent,  how  do  they  oompare  from  the  point  of 
view  of  pedagogy  and  NT-directed  application?  (2)  What  is  the  degree  of 
adequacy  with  which  natural  languages  can  be  described  by  CP  grammars  and 
their  equivalents? 

As  to  the  first  question,  I  am  afraid  that  not  much  can  be  said  at  this 
stage.  I  am  not  aware  of  any  experiments  made  as  yet  to  determine  the  peda¬ 
gogical  status  of  the  various  equivalent  grammars.  Some  programmatic  state¬ 
ments  have  been  made  on  occasion,  but  I  would  not  want  to  attribute  much 
weight  to  them.  I  myself,  for  instance,  have  a  feeling  that  the  governor- 
dependent  terminology  of  the  dependencyand  projective  grammars  has  an  un¬ 
fortunate,  and  intrinsically  of  course  unwarranted,  side-effect  of  streng¬ 
thening  dogmatic  approaches  to  the  decision  of  what  governs  what.  The 
operator-operand  terminology  of  the  cate  go  rial  grammars  seems  to  be  emotio¬ 
nally  less  loaded,  but  again,  these  are  surely  minor  issues.  Altogether,  I 
would  advocate  the  performance  of  pedagogical  experiments  in  which  the  same 
miniature  language  would  be  taught  with  the  help  of  various  equivalent  grammars. 

I  do  not  foresee  any  particular  complications  for  such  projects. 

Turning  now  to  the  second  question  which  has  been  much  discussed  during 
the  last  few  years,  often  with  great  fervor,  the  situation  should  be  reasonably 
clear.  IS  grammars  are  definitely  inadequate  for  describing  any  natural  language, 
unless  this  last  term  is  mutilated,  for  what  must  be  regarded  as  arbitrary  and 
ad  hoc  reasons.  I  am  sorry  that  Tngve's  otherwise  extremely  useful  recent 
contributions  did  becloud  this  issue.  As  to  CP  grammars,  the  situatiqn  is  more 
complex  and  more  interesting.  It  is  almost,  but  not  quite,  certain  that  such 
grammars,  too,  are  inadequate  in  principle,  for  reasons  which  I  shall  not  repeat 
here,  ainoe  they  have  been  stated  many  times  in  the  recent  literature  and  been 
authoritatively  restated  by  Chomsky  [28].  But  of  even  greeted  importance, 
particularly  for  applications,  such  as  MT,  is  the  fact  that  such  grammars  seen 
definitely  to  be  inadequate  in  practice,  in  the  sense  that  the  number  and 


-  12 


oomplextiy  of  grammatical  rules  of  this  type,  in  order  to  achieve  a  tolerable, 
If  not  perfeot,  degree  of  adequacy,  will  have  to  be  so  intense  as  to  defeat 
the  practical  purpose  of  establishing  these  rules.  Transformational  grammars 
seem  to  have  a  much  better  chance  of  being  both  adequate  and  practical,  though 
this  point  is  still  far  from  being  settled.  In  view  of  this  fact,  which  does 
not  appear  to  have  been  seriously  challenged  by  most  wo  rice  re  on  NT,  it  is 
surprising  to  see-that  moat,  if  not  all,  current  programs  of  automatic  syn¬ 
tactic  analysis  are  based  on  impractical  grammars.  In  some  groups,  where  the 
impracticability  and/or  inadequacy  has  received  serious  attention,  attempts 
are  being  made  at  present  to  classify  the  "recalcitrant”  phenomena  and  to  find 
ad  hoc  remedies  for  than.  You  will  not  be  surprised  if  I  say  that  I  take  a 
rather  dim  view  of  these  attempts.  But  this  already  leads  to  Issues  which  I 
intend  to  discuss  in  subsequent  lectures. 


SECOW)  IACTUBBs  SlflTACTIC  COMPLBXITT 


Extras* ly  littla  ia  known  about  syntactic  complexity,  though  this 
notion  haa  ooae  up  in  many  discussions  of  style,  raadability,  sad,  nor* 
recently,  of  mechanisation  of  syntactic  analysis.  Its  axplication  has 
bssn  universally  rsgardsd  as  a  sat tar  of  great  difficulty,  this  probably 
being  tha  reason  why  it  has  also  baan,  to  ay  knowledge,  univarsally 
shunned.  Vhsn  such  authors  as  Flasch  [34]  developed  their  raadability 
■assures,  they  could  not  hslp  facing  the  problem  but,  unable  to  oops 
with  it,  raplaoad  syntactic  eoaplexity  in  their  formulas  by  length, 
whose  Measure  poses  incomparably  fewer  problems,  while  still  standing 
in  sons  high  statistical  correlation  with  the  elusive  syntactic  oom- 
plexity. 

Vary  often  one  hears,  or  reads,  of  an  author,  a  professional  group, 
of  even  a  whole  linguistic  ooMunity  being  accused  of  expressing  than- 
eelves  with  greater  syntactic  complexity  than  necessary.  Such  slogans 
as  "Vhat  sen  be  said  at  all,  can  be  said  a  imply  end  elearly  in  any 
civilised  language,  or  in  e  suitable  system  of  symbol*.",  formulated 
by  the  British  philosopher  C.  D.  Broad  in  elaboration  of  a  well-known 
dictum  by  Wittgenstein,  were  used  by  philosophers  of  certain  schools  to 
oritiols*  philosophers  of  other  schools,  and  have  gained  particular 
respectability  in  this  context.  On  a  less  exalted  level,  most  people 
interested  in  information  processing  and,  in  particular,  ia  tha  con¬ 
densation  of  information,  preferably  by  machine,  seem  to  be  convinced 
that  most,  if  not  all,  of  what  is  ordinarily  aaid  could  be  said  not  only 
in  eyntactioally  almnlar  sentences  but  in  syntactically  eiemle  sentences , 
the  analysis  of  whioh  would  be  a  pleasure  for  e  machine.  Often, 
informationloseless  transformation  into  syntactically  simple  sentences 
is  rsgardsd  as  s  helpful,  pexhape  even  neoessaxy  step  prior  to  further 
processing.  In  the  context  of  machine  truncation,  Barrie,  e.g.,  ones 
expressed  the  bunch  that  neehanical  translation  of  kernel  sentences , 
which  would  presumably  rank  lowest  on  any  soale  of  syntactic  complexity, 
should  be  a  simpler  affair  than  translation  of  any  old  sentences. 


-  2  - 

It  is  ay  conviction  that  the  topic  of  syntactic  complexity  is, 
beyond  certain  very  narrow  limits  of  a  vaguely  felt  concensus,  ridden 
with  bias,  prejudice  and  fallaoiea  to  such  a  degree  as  to  make  almost 
everything  that  has  been  said  on  it  completely  worthless.  In  particular, 
I  think  that  the  "Vittgensteinian"  slogan  mentioned  above  is  misleading 
in  the  extreme.  I  tend  to  believe  that  its  attractiveness  is  due  to 
its  being  understood  not  as  a  statement  of  fact  but  rather  as  a  kind  of 
general  and  vague  advice  to  say  whatever  one  wants  to  say  as  simply  and 
clearly  "as  possible,”  something  to  which  one  could  hardly  object, 
though,  as  we  shall  see,  even  in/ this  interpretation  it  is  not  unequi- 

i 

vocally  good  advice,  when  simplicity  is  understood  as  syntactic  sim¬ 
plicity,  since  the  price  to  be  paid  for  reducing  syntactic  complexity, 
even  when  it  is  "possible,”  nay  well  turn  out  to  be  too  high. 

So  far,  I  have  been  using  "syntactic  complexity"  in  its  pre- 
theorstioal  and  unanalysed  vague  sense.  It  in  time  to  become  more 
systematic. 

One  should  not  be  surprised  that  the  explication  of  syntactic 
complexity  to  which  we  shall  presently  turn  will  reveal  that  the  pre- 
theoretical  term  is  highly  equivocal,  though  one  might  well  be  surprised 
to  learn  how  equivocal  it  is. 

When  I  said  in  the  opening  phrase  that  "extremely  little  is  known 
about  syntactic  complexity,”  I  intended  the  modifier  "extremely  little” 
to  be  understood  literally  and  not  os  a  polite  version  of  "nothing.” 

Suoh  terns  os  "nesting,”  "discontinuous  constituents,"  "self-embedding" 
and  "syntactic  depth”  are  being  used  in  Increasing  frequency  by  linguists 
in  general  and  -  perhaps  unfortunately  so  -  by  applied  linguists  in  par¬ 
ticular,  especially  when  programming  for  machine  analysis  is  discussed. 
But  not  until  very  recently  have  these  notions  been  provided  with  a 
reasonably  rigid  formal  definition  which  alone  makes  possible  their 
responsible  discussion.  The  most  reoent  and  most  elaborate  discussion 
that  has  oome  to  my  attention  is  that  by  Chomsky  and  Miller  [35}.  They 
discuss  there  various  explioata  for  "syntactic  complexity,"  with 


-V 

varying  degrees  of  tentativsness,  as  bsflts  suoh  a  first  attempt,  and 
I  shall  oaks  much  uat  of  this  treatment  in  vhat  follows. 

Lst  no  first  disoard  on*  notion  which,  as  already  mentioned,  has  a 
oortain  IttflLt  appsal  to  .serve  as  a  possibls  explicatum  for  syn¬ 

tactic  complexity,  naasly  length.  measured,  say,  fay  ths  nuaber  of 
words  in  ths  asntenos  (or  in  whatever  other  construction  is  under  inves¬ 
tigation).  Though,  as  said  bsforo,  it  is  obvious  that  there  should 
exist  a  fairly  high  statistical  correlation  between  syntactic  conplexity 
and  length,  it  should  be  equally  obvious  that  length  is  entirely  in¬ 
adequate  to  serve  as  an  explioatua  for  syntactic  complexity.  Take  as 
■any  sentences  as  you  wish  of  the  form  .  is  — *  (sueh  as  "John  is 
hungry.",  "Paul  is  thirsty.-,  etc.)  whose  intuitive  degree  of  syn¬ 
tactic  conplexity  is  close,  if  not  equal,  to  ths  lowest  one  possible, 
Join  then  by  repeated  ooeurrenoes  of  "and",  (a  procedure  resulting  in 
something  like  "John  is  hungry  and  Paul  is  thirsty  and  Hazy  is  sleepy 
and...-),  and  you  will  get  sentences  of  any  length  you  wish  whose 
intuitive  degree  of  syntaotio  ooaplsxity  should  still  be  close  to  the 
minimum.  True  enough,  s  sentence  of  this  fore,  containing  50 
clauses  of  the  type  mentioned,  always  with  different  proper  names 
in  the  first  position  and  different  adjeetives  in  the  third  position 
would  be  difficult  to  remember  exaotly.  Therefore  suoh  a  sentence  will 
be  "complex,"  in  one  of  the  many  senses  of  this  word,  but  surely  not 
syntactically  so.  Mo  normal  English  speaking  parson  will  have  the 
slightest  difficulty  in  telling  the  exact  syntactic  fora,  up  to  a 
parameter,  of  the  resulting  sentence,  and  there  will  be  no  increase 
in  this  difficulty  even  if  ths  number  of  clauses  will  be  100,  1000, 
or  any  number  you  wish.  In  one  very  important  sense  of  "understanding, " 
the  increased  length  of  sentences  of  this  type  will  not  increase  the 
difflcul ty  of  understanding  them.  And  the  sense  in  question  is,  of 
course,  precisely  that  of  grasping  ths  syntaotio  structure. 

The  next  remark,  prior  to  presenting  sons  of  the  more  interesting 


% 


txpl icata,  refers  to  a  faot  whioh  X  want  vary  much  to  call  to  your  carefuj 
attention,  1  hope  it  will  not  bo  aa  surprising  to  you  aa  It  was  to  mo, 
the  first  time  1  nit  upon  it.  Tor  a  time,  1  thought  that  the  only 
re i'itj.vi nation  needed  for  explicating  syntactic  oompi.exi.ty  would  be  the 
trivial  one  to  a  given  language.  (Logicians,  and  some  linguists,  know 
plenty  of  examples  where  the  •same’*  sentence  say  belong  to  entirely  dif¬ 
ferent  languages;  in  that  case,  nobody  would  be  surprised  to  learn  that 
i t  also  has  -  or  rather  that  tney  also  have  -  different  degrees  of  syn¬ 
tactic  complexity,  relative  to  their  respective  languages.)  *hat  did 
shock  me,  however,  though  only  for  a  moment  until  I  realized  that  it 
could  not  be  otherwise,  was  that  degree  of  complexity  must  also  be  ex¬ 
plicated  an  being  relative  to  a  g reamer,  that  the  same  sentence  of  the 
same  language  may  have  one  degree  of  complexity  when  analysed  from  the 
point  of  view  of  one  grammar  and  a  different  one  when  analyzed  from  the 
point  of  view  of  another  grammar,  and  that,  of  two  different  sentences, 
one  may  have  a  higuer  degree  of  complexity  than  the  other  relative  to 
one  grammar,  but  a  lower  degree  relative  to  another  graswar. 

This  doubtless  being  the  case,  may  I  be  allowed  a  certain  amount 
of  speculation  for  a  minute?  It  is  a  simple  nnd  well-known  fact  that 
the  same  sentence  will  sometimes  be  better  understood  by  person  A  than 
by  3,  though  they  have  about  the  same  IQ,  about  the  same  .background 
knowledge,  ami  though  they  read  or  hear  it  with  about  equal  attention, 
as  far  as  ono  can  make  out.  Could  it  be  that  they  are  ( suboonsc iously , 
of  course)  analysing  this  same  ssntsnoe  according  to  different  grammars, 
relative  to  which  this  sentence  haa  different  degrees  of  syntactic 
complexity'/  iould  it  be  that  part  of  the  improvement  in  understanding 
obtained  through  training  and  familiarisation  is  due  to  the  trainee's 
learning  to  employ  another  graamar  (whose  difference  from  the  one  he 
was  accustomed  to  employ  before  might  be  only  minimal,  so  that  the 
acquisition  of  this  new  grammar  might  not  have  been  too  difficult, 
periapn.’*'  Could  it  be  that  many,  if  not  all,  of  us  work  with  more  .. 
than  one  grammar  simultaneously,  switching  from  the  one  to  the  other 


-  5  - 

when  the  employment  of  the  one  runs  us  into  trouble*  e.g.*  when  ac¬ 
cording  to  one  gr usar  the  degree  of  complexity  of  a  given  sentence 
is  greater  than  one  oan  stand?  More  about  this  later.  Attractive 
as  those  speculations  are*  let  me  stress  that  at  this  sonant  I  don't 
know  of  any  way  of  putting  then  to  a  direct  empirical  test.  But  I 
wish  aoneone  would  think  up  such  a  way.  Let  ne  also  add  that  he  who 
does  not  like  this  picture  of  different  graanars  for  the  eaae  language 
lying  peacefully  side  by  side  somewhere  in  our  brain*  aay  look  upon 
the  situation  as  one  system  of  granatical  rules  (the  set- theoretical 
union  of  the  two  sets  discussed  so  far)  being  stored  in  the  brain, 
and  allowing  the  ease  sentence  to  be  analysed  and  understood  in  two 
different  ways  with  two  different  degrees  of  complexity*  with  a  control 
element  deciding  which  rules  to  apply  in  a  given  case  and  allowing  the 
switch  to  other  rules  whan  trouble  strikes.  That  there  are  syntactically 
ambiguous  sentences  has*  of  course*  always  been  well  known*  but  I  am 
speaking  at  the  moment  about  a  particular  kind  of  syntactic  ambiguity* 
one  that  has  no  semantic  ambiguities  in  its  make*  but  where  the  difference 
in  the  analysis  still  creates  a  difference  in  oomprehenslbility.  At 
this  point  it  is  probably  worthwhile  to  present  an  extremely  simple 
example.  The  Aagllsh  Sentence,  "John  loves  Mary.-,  can  be  analysed 
(and  has  been  analysed)  in  two  different  ways*  each  of  which  will  be 
expressed  here  in  two  different  but  equivalent  notations  which  have 
been  simplified  for  our  present  purposes i 


(sV0ta  )(fF<Tt1""dVtaT))> 
/V 

(s(HpJohn)(VtloTee)(IipHaxy)) 

7  vf  1 

John  loves  Mary 

These  analyses  oorresp^id  to  the  following  two  "grammars,"  0.  and  O^t 

0.1  84  XPtfP 

O  s  S  -4  HP  +  Vt  +  HP 

1  VP  -4  Vt  +  HP 

IP  -4  Jsbn,  Hazy 

HP  -4  John*  Hazy 

7t  -4  loves 

Vt  -4  loves 

-  6  - 

or,  if  you  prof  or,  they  both  oorrespond  to  tho  grammar  G.s,  which  is  the 
set-theoretical  union  of  0^  and  Qg,  and  ooneista  therefore  of  just  the 
rules  of  0^  plus  the  first  rule  of  0^.  (Both  0^  and  Gg  are  of  course 
C T-  g - ;  0^  is  binary.  but  Gg,  and  therefore  also  0y  is  not.) 

Though  the  difference  in  structure  assigned  to  this  sentence  by  the 
two  analyses  is  palpable,  it  is  less  clear  whether  this  difference  iapliee 
a  difference  in  the  intuitive  degree  of  syntactic  complexity,  and  if  so, 
according  to  which  analysis  the  sentence  is  sore  ooaplex.  As  a  natter  of 
fact,  good  reasons  can  be  given  for  both  views]  in  the  first  analysis, 
no re  rules  are  applied  but  each  rule  has  a  particularly  slnple  form  in 
tho  eecond  analysis  fever  rules  are  applied,  but  one  of  then  has  a  nore 
oonplicated  fora.  This  situation  seess  to  iailoate  that  we  have  nore 
than  one  explieandun  before  us,  nore  than  one  notion  which,  in  the  pre- 
theoretioal  stage,  is  entitled  to  be  called  "syntactic  complexity." 

There  are  still  nore  aspects  to  the  intuitive  uses  of  "syntactic 
complexity,"  but  perhaps  it  is  tine  to  tun  directly  to  the  explicate 
which,  hopefully,  will  take  oare  of  at  least  sons  of  these  aspects. 

To  follow  Chonsky  once  again  [35]  rather  cloeely,  we  might  introduce 

the  tens  "jiBflLfii  aitarort  mbali"  "ortt/tiimairBod?  afla," 

to  denote  the  following  two  relevant  neasuresi  the  first  for  Tngve's 
well-known  depth-measure,  which,  I  trust,  will  again  be  explained  in  his 
lectures  at  this  Institute,  the  second  for  a  new  concept  which  has  not 
yet  been  discussed  in  the  literature.  Both  measures  refer  to  the  tree 
representing  the  sentence  and  are  therefore  applicable  only  to  such 
gresnere  which  assign  tree  structure  to  each  sentence  generated  by  than. 

If  we  assign,  in  the  Tngve  fashion,  numbers  to  the  nodes  and  branches 
(with  the  branohee  leading  to  the  terminal  symbols  left  out),  we  see  that 
the  greatest  number  assigned  to  any  of  the  nodes  of  the  left  tree  is  1, 
so  that  its  depth  of  postponed  eynbols  is  also  1,  whereas  the  corresponding 
number  for  the  second  tree  is  2.  On  the  other  hand,  the  total  number  of 
nodes  of  the  first  tree  is  5,  the  nusber  of  its  terminal  nodes  is  3,  so 


that  it*  node/ terminal-node  ratio  is  5/3*  whereas  ths  corresponding  nuabors  for 
tho  moo  ad  troe  art  4*  3*  and  4/3  respectively. 


laoh  nods  number  (in  paroathasos)  .is  equal  to  tho  sub  of  ths  Busbar  assigned  to 
tho  branch  leading  to  this  sods  sad  tho  nssbar  of  tha  nods  froa  which  tha  branch 

There  ars  at  laast  throo  more  notions  that  ars  sntitlsd  to  ba  oonsidarsd  as 
azplioata  for  othar  aspaets  of  syntactic  ocsiplaxlty.  Tha  on*  that  has  boon  nost 
studiad  is  ths  daarea  of  naatlna.  Tha  reasons  for  tha  attantion  given  to  it  ars 
that  it  has  boon  known  for  a  lone  tins  that  a  highly  nastad  asntsnca  oausaa  dif- 
fioultios  in  ooaprshanslon  and,  sort  raoantly*  that  it  oraatas  troublas  for  me- 
ohanioal  syntactic  analysis.  Ona  rough  sxplioation  of  this  notion  (than  ars 
0 thsrs)  night  run  as  follows,  again  relative  to  traa  graaaarst  Tha  degree  of 
asstinc  of  a  labalod  traa  is  tha  largest  integer  a,  such  that  thsrs  saints  in 
this  trsa  a  path  through  sfl  nodas  1^,  ...  ,I>t  with  tha  saaa  or  diffarsnt 
labs  la,  where  aaoh  (i>l)  is  an  innar  nods  in  tha  subtree  rooted  in 
Tha  sans  degree  of  nesting  is  also  assigned  to  the  terminal  expression  as 
analysed  by  this  trsa. 

A  special  oaaa  of  nesting  i*  self  to  whoM  importance  Chonnky 

has  oallsd  attantion.  In  order  to  define  tha  degree  of  self  ■embedding  of  a 
labeled  tree,  one  has  only  to  change  in  tha  abora  definition  of  degree  of  nest¬ 
ing  tha  phrase  "with  tha  ease  or  diffarsnt  labels"  by  tha  phrase  "aaoh  with  tha 
ease  label.”  (Other  definitions  are  again  possible.) 

To  present  one  nope  stock  ample,  ths  following  tree  has  a  degree  of 
nesting  (equal*  in  this  particular  oaaa,  to  its  degree  of  salf-eeibadding)  of  4. 
(its  depth,  incidentally,  is  7  and  its  node/ terminal-node  ratio  is  2l/l5  •  7/5.) 


-  9  - 


Though  this  tree  oould  have  bssn  derived  from  a  grammar  G^  differing 

from  Qj  only  by  containing  the  additional  rules 

NP  — ►KP  +  ia+NP+Vt 
Ra  -4  whoa 

there  are  very  good  reasons  why  sentences  of  the  type 
John  whoa  Ann  hates  loves  Mary 

and  their  ramifications  should,  in  the  framework  of  the  whole  English 
language,  not  be  regarded  as  being  produced  by  a  CP-  grammar  containing  G^ 
as  a  proper  part,  but  rather  by  a  transformational  grammar  built  upon  a 
CP-  grammar  of  Sngliah  containing,  in  addition,  a  transformational  rule,  which  I 
shall  not  specify  here,  allowing  the  derivation  of 


from 


+  Ra  +  HP^  +  Vt  +  Vt  +  K>2 


and 


HP,  +  Vt  +  Iff 
1  2 


»3  +  Vt  +  . 

(There  is  no  need  to  stress  that  all  this  is  only  a  very  rough  approximation 
to  the  inoomparably  more  refined  treatment  which  a  full-fledged  transforma¬ 
tional  grammar  of  Bnglish  would  require.  The  transformational  rule,  for 
instance,  should  refer  to  the  trees  representing  the  strings  under  discus¬ 
sion  rather  than  to  the  strings  themselves.)  It  is  worthwhile  noticing 
that  the  node/tezniaal-node  ratio  (7/5)  of  the  resulting  tree  is  smaller 
than  the  ration  ( 5/j )  of  the  underlying  trees. 


The  fifth  aapeot  of  syntactic  complexity  is,  then, 
history.  I  am,  of  course,  not  using  the  tern  "measure”  now,  because  it 
ia  very  doubtful  whether  measures  can  be  usefully  assigned  to  this  oon- 
oept.  So  far,  no  attempt  in  this  direction  has  been  made.  I  shall,  there¬ 
for,  say  no  more  about  this  notion  here. 


It  is  not  particularly  difficult  to  develop  these  five  notions,  and 
aany  more  oould  be  thought  of.  The  decisive  questions  are  twofold!  What 
are  the  exact  formal  properties  of  the  various  notions  and  perhaps  even 


10  - 


more  important,  what  is  thsir  psychological  raality,  to  usa  a  tens  of  Sapir's? 
In  ftnaral,  on*  would  tend  to  require  that  if  on*  ssntsnos  is  syntactically 
mors  oomplsx  than  anothsr,  than,  eavitwia.  it  should,  perhaps  only 

on  ths  average,  create  mors  difficulties  in  its  oomprehension.  What  oan 
we  say  on  this  point? 

Well,  very  little,  and  nothing  so  far  under  controlled  experimental 
conditions.  Highly  nested  constructions  just  don't  occur  at  all  in  normal 
speech  and  very  rarely  in  writing,  with  the  notable  exception  of  logical 
or  mathematical  formulas.  n»lr  syntactic  structure  oan  be  grasped  only  by 
using  extraordinary  means  such  as  going  over  them  more  than  once  and  using 
special  naxkes  for  pairing  off  expressions  that  belong  together  but  between 
which  other  expressions  have  bean  nested.  As  formula  such  as 
[[p=>  [q  r[rp[e  3  t]]  >u]31  ?v] 

ia  certainly  not  e  very  complex  one  among  the  formulas  of  the  propositional 
calculus,  as  they  go,  but  testing  its  well-formedness  would  either  require 
some  artificial  aide,  such  as  the  use  of  a  pencil  for  marking  off  paired 
brackets ,  or  the  acquisition  of  a  special  algorithm  based  upon  a  particular 
counting  procedure,  or  else  just  an  extraordinary  (and  unanalysed)  effort 
and  concentration.  It  is  doubtful  whether  any  effort,  without  external 
aide,  would  suffice  to  determine  that  the  "literal"  ftoglish  rendition  of 
the  formula  as 

If  if  p  then  if  q  then  if  if  r  then  if  e  then  t  them  u  then  v 
is  well-formed,  when  one  listens  to  such  a  sentenoe  without  prior  warning. 

It  is  intsresting  that  in  ordar  to  explain  our  difficulties  in  either 
uttering  or  grasping  the  structure  of  such  sentences  we  need  assuao  nothing 
more  than  that  we  are  finite  automata  with  a  finite  number  of  into  rod. 
states.  For  Chomsky  [36],  in  effect,  has  shown  that  when  the  number  of 
these  states  is  some  number  n,  then,  relative  to  a  given  grammar  0,  there 
exists  a  number  a  (depending  on  n  )  such  that  this  device  will  not  be  able 
to  correctly  analyse  the  syntactic  structure  of  all  sentences  whose  degree  of 
nesting  is  greater  than  or  equal  to  m.  (is  a  natter  of  fact,  Chomsky 

t 

showed  this  for  degree  of  aelf-embedding  rather  than  for  nesting,  but  the 


- 11  - 


proof  eon  be  trivially  extended  to  this  ease.) 

On  the  incomparably  stronger  assumption  that  natural  languages  (such 
as  Aaglish)  can  be  adequately  determined  by  tree  greasers,  that  human 
speakers  of  such  a  language  have  at  least  onesueh  tree  gnatr  stored  in 
their  permanent  memory,  that  they  utter  the  sentences  of  these  languages 
by  going  through  (one  of)  their  tree(s)  "from  top  to  bottom  and  f»m  left 
to  right,"  that  all  storage  required  for  this  process  is  done  in  an  im- 
nediate  memory  of  the  push-down  store  fora  containing,  say,  n  cells,  we 
arrive  at  the  conclusion  that  only  sentences  whose  depth  of  postponed  sysbols 
is  no  higher  than  n  can  be  uttered  by  such  speakers. 

Now,  though  Yngve  continues  to  believe  that  there  exists  good  evidence 
for  the  soundness  of  these  aa sumptions,  Chomsky  has  on  various  occasions  [37, 38] 
expressed  his  doubts  as  to  this  evaluation  of  the  evidence.  He  believes 
that  most  of  the  positive  evidence  invoked  by  Tngve  can  already  be  explained 
on  the  basis  of  the  weaker  assumption  mentioned  above,  whereas  he  mentions 
the  existence  of  other  evidence  which  tends  to  refute  Yngve 's  stronger 
assumptions  though  not  his  own  weak  one.  I  have  no  time  to  go  further  into 
this  controversy.  Let  me  only  state  that  Chomsky's  arguments  seen  to  me  to 
be  the  more  conclusive  ones.  This,  of  course,  by  no  means  diminishes  the  - 
credit  due  to  Yngve  for  having  been  the  first  to  have  raised  certain  types 
of  questions  that  were  never  asked  before,  and  to  have  ventured  to  provide 
for  then  interesting  answers,  though  they  nay  well  turn  out  to  be  the 
wrong  ones. 

It  is  time  now  to  say  at  least  a  few  words  on  the  "Vittgensteinian 
Thesis."  In  one  sense,  this  thesis  is  of  course  perfectly  trust  After 
all,  all  of  us  do  manage  to  say  most  of  what  we  have  to  say  in  sentences  of 
a  low  degree  of  nesting  and,  if  really  necessary,  could  rephrase  even  those 
things  for  the  expression  of  which  we  do  use  highly  nested  strings,  such  as 
occur  in  many  mathematical  formulas,  in  syntactically  less  oomplsx  ways, 
which  will  be  presently  investigated.  But  in  this  sense,  the  thesis  is  no 
more  than  a  rather  uninteresting  truism.  What  Wittgenstein,  Broad  and  the 
innumerably  sany  other  people  who  invoked  this  slogan  doubtless  had  in  mind 


-  12 


was  that  most,  if  not  all,  of  tha  things  that  are  expressed  (usually,  by  such 
and  such  an  author,  by  such  and  such  a  cultural  group,  eto.)  by  sentences  with 
high  syntactic  coaplexity  could  have  been  expressed  with  sentences  of  lower 
syntactic  complexity,  without  anw  compensation.  In  this  interesting  inter¬ 
pretation,  Wittgenstein's  Thesis  mesas  to  ae  wrong,  almost  demonstraoly  so. 

I  would,  on  the  contrary,  want  to  express  and  justify,  if  not  really  demon¬ 
strate,  the  following  "Anti-Vittgensteinian  Thesis" t  ffrr  ITfli  If"*11""""- 
for  all  interesting  (suffioently  rich)  ones,  therm. aim  things  worth,  flaring 
Jfllfih  »"""»*  3a  expressed  in  sentenoss  with  a  lowdtaree  of  _aiataotifi 
plexitv.  without  a  loss  being  incurred  in  other  oo— unlcationallv  important 

aasas&t* 

Though  a  fuller  justification  will  haws  to  be  postponed  for  another 
occasion,  let  me  make  here  the  following  remarks.  Consider  one  of  the 
simplest,  oalculi  even  invented  by  logicians,  the  so-called  jgpjU?M'l?nflTi 
propositional  calculus  [39.  P*  140],  Vs  are  here  interested  only  in  its 
rules  of  formation  but  not  in  its  axioms  or  theorems. 

The  rules  of  formation  of  one  of  the  many  formulations  of  this  calculus 
are  as  followst  Its  primitive  symbols  are  the  three  improper  symbols 

].  =  ,  [ 

and  the  infinitely  aany  proper  symbols 

?1»  ?2»  •••  • 

Its  rules  of  formation  are  just  the  following  twoi 

FI.  Bach  proper  symbol  is  well-formed  (wf). 

72.  Whenever  a  and  a  are  wf,  so  is  [a-^p] 

(with  the  understanding  that  nothing  is  wf  unless  it  is  so  by  virtue  of 
FI  and  F2).  There  exists  no-  bound  to  -the  degree  of  nesting  of  the  wf 
formulas  of  this  calculus,  as  is  obvious  from  the  series  of  wf  formulas 

Pl’  ^l"*  P2^’  hr*  P3^’  k’l'*  ^P2"’  . 

It  is  less  obvious,  but  oan  at  any  rate  be  rigorously  proved,,  that  for  none 
of  these  formulas  does  there  exist  in  the  calculus  another  formula  which  is 
logically  equivalent  to  it  but  has  a  lesser  degree  of  nesting.  (The  term 
•logically  equivalent*  needs  explanation  in  our  context,  but  I  shall  never- 


-  13  - 

theleae  not  provide  it.  ?or  logicians  the  required  explanation  would  be 
rather  obvious ,  for  non-logicians  it  would  take  too  much  tine.)  Wittgen¬ 
stein's  Thesis  does  not  hold  in  this  calculus. 

Consider  now  the  (logically  utterly  uninteresting)  conjunctional 
propositi^qiJ  ^alnuiua.  whose  rules  of  formation  are  analogous  to  those  of 
the  implicational  calculus,  except  that  ' ^ '  is  to  be  replaced  by  *A ' 
in  both  the  list  of  inproper  symbols  and  F2.  Here,  too,  it  can  be  shown, 
by  a  somewhat  more  complicated  argtment,  that  for  each  n  there  exist 
wf  formulas  whose  degree  of  nesting  is  higher  than  n  such  that  they  are 
not  logically  equivalent  to  any  wf  formula  with  a  lesser  degree  of  nesting. 

But  there  exists  the  following  interesting  difference  between  the  two 
calculi!  The  conjunctional  calculus,  as  presented  here,  looks  unduly  com¬ 
plex.  Since  conjunction  is  "associative,”  i.e.,  since  [p^A  [p^  p^3  *®d 
[fo^  Pgl'iPj]  are  equivalent,  the  brackets  fulfill  no  semantically  important 
function  within  the  calculus  and  could  as  well  have  been  omitted  from  the 
list  of  improper  symbols,  with  a  corresponding  simplification  in  rule  P2. 

In  this  version,  all  wf  formulas  would  have  had  a  degree  of  nesting  0, 
as  can  easily  be  verified!  True  enough,  all  formulas  with  at  least  two 
conjunction  signs  would  have  become  syntactically  ambiguous,  but,  in  this 
particular  calculus,  syntactic  ambiguity  would  not  have  entailed  semantic 
ambiguity.  Syntactic  simplification  could  have  been  achieved,  and  in  the 
most  extreme  fashion,  without  any  semantic  loss  whatsoever!  - 

This  is  by  no  means  ths  case  for  the  implicational  oaloulus.  Impli¬ 
cation  is  not  associative,  so  that  the  syntactic  ambiguity  introduced  by 
omission  of  brackets  would  have  entailed  semantic  ambiguity,  a  prloe  no 
logician  could  possibly  be  ready  to  pay  in  this  oonneotion,  though  again 
all  resulting  formulas  would  have  got  a  degree  of  neetedneee  0. 

(is  for  conjunctional  oaloulus,  as  soon  as  it  is  oombined  with 
some  other  oaloulus,  say  the  diajunotlonal  oaloulus,  omission  of  brackets 
would  again  entail  semantic  ambiguity,  since,  say,  [p^A  [pgV/PjD  and 
[[p^PgVPj]  lr*  not  equivalent.) 


-  14  - 

Fbr  those  of  you  who  have  hoard  of  the  so-called  Polish  brscket-free 
notation,  1st  as  add  the  following  rework.  One  sight  have  thought  that  the 
nesting  (which  in  this  particular  oase  is  also  self-eabedding)  is  due  to 
the  use  of  brackets  for  scoping  purposes,  in  accordance  with  standard  aathe- 
aatical  usage,  since  it  seeas  that  the  brackets  "cause"  the  branchings  to 
be  "inner"  ones,  and  sight  therefore  have  cherished  the  hope  that  a  bracket- 
free  notation  would  eliainate,  or  at  least  reduce,  nesting.  But  this  hope  is 
illusory.  Inner  branching,  thrown  out  through  the  front  door,  would  re¬ 
enter  through  the  hack  door.  With  'C*  as  the  only  improper  symbol  and  F2 
changed  to:  Whenever  a  and  0  are  vf ,  so  is  Cafi,  expansion  of  a 
(though  not  of  p)  causes  inner  branching.  Notice  further  that  in  Polish 
notation  calculi  you  cannot  introduce  syntactic  ambiguity,  harmless  or  harm¬ 
ful,  even  if  you  want  to,  by  cad t ting  symbols ,  ainoe  there  are  no  special 
scoping  symbols  to  omit. 

As  far  as  natural  languages  are  oonoemed,  the  situation  is  much  more 
confused.  In  speech,  it  seams  that  we  can  express  distinctions  of  scope 
up  to  a  degree  of  nesting  of  3,  anything  beyond  that  becoming  blurred.,  whereas 
in  writing  things  are  still  worse,  punctuation  marks  •  not  being  consistently 
used  for  scoping  purposes  and  anyhow  not  being  adequate  for  this  task,  with 
the  result  that  syntactic  ambiguities  abound,  which  may  or  say  not  be  reduced 
through  context  or  background  knowledge.  Sometimes,  when  the  resulting 
semantic  ambiguity  becomes  intolerable,  extraordinary  measures  are  taken, 
such  as  using  scoping  symbols  like  parentheses  in  ways  ordinarily  reserved 
for  mathematical  formulas  only,  indentation  at  various  depths,  ad  hoc 
abbrevai tiona ,  etc.  • 

Ratural  languages  have  many  so- to-* peak  built-in  devices  for  syntactic 
simplification.  These  devices,  and  thsir  effectiveness,  are  badly  in  need 
of  further  study,  after  the  extremely  interesting  beginnins  by  Yngve  [30]. 

Certain  "simplifications,"  beloved  by  editors  w}jo  axe  out  to  split  up 
involved  sentences,  may  well  turn  out  to  be  spurious  and  pdrhaps  even  down- 


15- 


right  hareful,  in  apite  of  appeaxanoes.  An  odltor  who  rewrites  on  author's 
■Since  p  and  4  and  r,  tharoforo  a.*  (where  you  have  to  Imagine  tha  lattara 
p,  q,  r,  and  a  raplaosd  by  aantanoaa  which  on  oeoaaion  will  the— lvee  haw# 
considerable  syntactic  ooaplaxity)  by  "p.  4.  r.  Tharafora  a."  is  probably 
undar  tbs  illusion  that  ha  has  aiaplifiad  so— thing  and  tharafora  improved 
ao— thiny.  Vow,  ha  ha a  doubtla—  raplaosd  0—  long  san tones  with  a  degree  of 
ayntaotio  ooaplaxity  of,  say.  jh,  by  four  shorter  aantanoaa,  aach  with  a  de¬ 
gree  of  ayntaotio  ooaplaxity  of  at  —at  n-1.  and  has  swan  used  three  words 
less  for  this  purpose.  But  there  is  a  prioe  oonneoted  with  this  procedure, 
even  a  twofold  one.  First,  the  word  "therefore"  has  beooae  eeaantioally  auoh 
no  re  indefinite.  ¥hat  fort  "a,  for  r.",  or  •  a,  for  4  aad  r.",  or  "a,  for 
p  and  4  and  r."?  (And  thia  night  ngt  be  all.  p  will  be  preceded  by  other 
aentenoes,  so  that,  at  least  froa  a  purely  ayntaotio  point  of  view,-  it  is 
totally  indefinite  how  far  baok  one  has  to  go  in  tha  list  of  poasibla  ante¬ 
cedents  to  a.)  Secondly,  even  if  the  exact  anteoedeat  is  settled,  in  order 
to  understand  tha  full  oontant  of  tha  argunent  and  to  judge  its  validity,  tha 
reader  (or  listener)  will  have  to  recall,  or  re-read,  the  anteoadant  (which, 
ao  1st  us  speculate,  night  have  been  reaoved  into  aoae  larger,  —re  permanent 
and  lass  easily  accessible  storage  than  tha  iaasdiate  aaaory  it  was  occupying 
during  tha  ayntaotio  prooaasing),  with  tha  result  that  tha  overall  eoonoay  of 
tha  " improvement"  is,  to  say  tha  least,  very  doubtful.  There  is  at  least  a 
good  chance  that  the  total  effort  re4Uired  of  the  reoeiver  of  tha  —swage 
will  be  higher  in  tha  oasa  of  the  "aplit-up"  santanoa  than  with  regard  to  the 
original  sentence ,  though  it  night  well  be  easier  on  tha  sender,  had  ha  wanted 
to  express  hi— elf  origins lly  in  this  la—  definite  way.  (i  used  to  teach 
geo— try  in  high  eohool  and  atill  ran— bar  tha  type  of  student  who,  when  re¬ 
quired  to  da— nstrate  a  oertain  theorem,  would  a  tart  rattling, off  a  list  of 
ooagruanoes.or  inequalities ,  —  tha  oaae  night  Ip,  and  finish  with  a  triumphant 
"Therefore  (or,  "Froa  this  it  follow  that)...".  Aad  he  was  not  even  wrong.  • 
Because  froa  his  Hat,  and  in  aooorianoa  with  certain  theorems  already  proved, 
hia  conclusion  did  indeed  foil?*.  Kxoept  that  ha  left  tha  task  of  finding  out 
how,  in  detail,  tha  conclusion  followed  from  tha  premises,  to  tha  listeners, 


16  - 


inelwdiag  nyself  la . that  oase,  and.  provided  no  indication  of  the  fact  that 
he  himself  knew  the  detail*.) 

An  investigation'  recently  begun  in  Jerusalem,  eeeas  to  lead  to 
interesting  results  as  to  the  mutual ' relationships  between  (eeaantic) 
equivalence  among  the  sentences  of  a  given  fomal  ays  tea,  the  (syntactic) 
simplicity  of  these  sentences  and  the  existence  of  a  recursive  simplification 
function  for  this’ system.  The  results  will  be  published  in  a  forthcoming 
Technical' Heport.  Let  ae  only  mention  here  one  of  the  more  significant 
results,  (i  hop*  to  nobody's  particular  surprise.)  The  existence  of  a 
syntactic  simplification  algr /itha  is  rather  the  exception,  and  the  proof 
of  such  existence,  if  at  all,  will  id  general  require  that  the  aystea 
fulfill  fairly  tough  conditions.  The  details,  unfortunately,  require  a 
good  knowledge  of  recursive  function  theory  and  shall  therefore  not  be 
given  here. 


THIRD  LECTURE*  LANGUAGE  A1ID  SPEECH;  THEORY  VS.  OBSERVATION  IN  LINGUISTICS 


As  already  mentioned  in  ths  opening  sentence  of  ny  first  leetura,  sany  of 
us  bslisrs  that  during  ths  last  few  years  we  hare  gained  valuable  insights  into 
the  relationship  between  theory  and  observation  in  science.  Z  syself  have 
already  triad  on  a  few  occasions  to  apply  these  insights  to  certain  contro¬ 
versial  issues  of  aodern  linguistics  [40*  4l]*  X  would  now  like  to  do  ths  same 
with  regard  to  the  central  tera  of  linguistics,  namely  'language*  itself.  As 
you  will  soon  realise,  this  methodological  point  is  of  vital  importance  for  ths 
so-called  "research  aethodolc  -y"  in  HI,  and  insufficient  understanding  of  it 
has  already  caused  superfluous  controversies. 

The  tera  'language  has,  of  course,  been  "defined"  innumerably  many  times, 
but  the  faet  that  these  definitions  are  usually  autually  inconsistent,  at  least 
at  first  sight,  has  equally  often  been  forgotten  and  neglected,  so  that  seem¬ 
ingly  contradictory  statements  about  'language'  were  usually  interpreted  as 
inconsistent  statements  about  the  same  explica tun  (in  Carnap's  terminology) 
rather  than  consistent  statements  about  different  explicate. 

You  will,  for  instance,  find  in  the  literature  that  language  has  often 
been  treated  as  a  set  of  sentences  (or  utterances,  which  two  tens  will  not 
be  distinguished  for  the  moment).  This,  of  course,  is  an  abstraction  from 
ordinary  usage,  and  has  been  ncognised  as  such.  Leaving  aside  for  our  pre¬ 
sent  purposes  the  discussion  of  bow  good  and  useful  this  abstraction  is,  let 
as  point  out  that  ths  characterisation  can  bo  understood  (and  has  been  under¬ 
stood)  in  at  loast  the  following  five  senses t 

(1)  A  given  set  of  utterances,  such  as  recorded  on  a  certain  tape  by  so- 
and-so  on  such-and-such  an  ocoasion,  or  of  inscriptions,  found  on  such-and- 
such  a  tablat.  Such  asta  are,  of  course,  finite  end  most  of  them  ooatsln 
relatively  few  members.  They  osn  be,  end  sometimes  sze,  represented  as  lists, 
under  certain  transcriptions.  As  s  setter  of  foot,  such  sets  ere  only  excep¬ 
tionally  oallsd  'languages',  ths  mors  usual  term  being  ' corpus ' . 

(2)  Ths  set  of  all  utterances  (spoken  and/or  written)  mods  until  July 
1962,  say,  by  ths  members  of  sueh  and -such  s  coamunity  during  their  life¬ 
time  until  then.  This  set  is  certainly  finite,  too,  but  cannot,  in  general 


•2- 


be  presented  in  list  fora  and  la  rath*”  indefinite,  due  to  tha  indafinitanasa 
of  tha  term  "oommuni  tj“  and  for  doaana  of  othar  obvious  ra&aona ,  auoh  aa  thoaa 
centering  around  idiolects,  dlalaota,  bilingualness,  not  to  forgot  tha  vague- 
naaa  of  'uttaranea'  itaalf. 

(?)  Tha  aat  of  all  utterances,  paat,  prosant,  and  futura,  aada  by  members 
of  auoh  a  00— unity.  Thia  aat  diffara  froa  that  traatad  undar  (2)  only  in  har¬ 
ing  a  atill  graatar  dagraa  of  indeterminacy. 

(4)  Tha  aat  of  all  "possible*  uttaranea a  of  a  oartaln  kind.  Tha  notion 
"possible"  ooourringin  thia  oharaeteriaation  is  notorious  for  its  complexities 
and  philosophical  perplexities,  and  1  trust  I  shall  ba  forgivan  if  I  don't  go 
any  daopar  into  thia  ho  mat*  a  noat  hara.  Undar  aoat  ooncaptiona ,  this  sat 
will  turn  out  to  ba  infinlta. 

(5)  Tha  aat  of  all  "sentences*  (vall-foraod  axprassions ,  gra emetics! 
axprassions,  ate.),  (for  raoant  discussions  of  this  and  ralatad  hiararchios 
aaa,  a.g.,  Quina  [42]  and  Z1  $f  [43].) 

It  is  true,  of  course,  that  (l)  is  a  subset  of  (2),  which  again  is  a 
subset  of  (3),  but  thia  is  not  tha  crucial  point.  Much  sore  important  is 
that  tha  tam  'uttamnoa'  occurring  in  their  characterisation  changes  its 
aaaning  in  tha  transition  froa  (3)  to  (4),  becones  lass  observational  and 
acre  theoretical.  At  tha  sane  tine,  there  is  a  change  froa  a  concrete,  phy¬ 
sical,  three-  or  fouz^diaansional  entity,  a  "token,”  in  Peirce's  termino¬ 
logy,  to  an  abstract  entity,  a  "type.”  (When  Paul  and  John  aay  "I  aa  hungry.", 
wa  have  two  asabers  of  the  aat  (l),  since  they  uttered  two  different  "utter¬ 
ance-tokens,"  but  only  one  aeaber  of  tha  sat  (4),  since  these  tokens  are 
replicas  of  tha  sans  utterance- type.)  Tha  eleaents  of  aat  (3),  finally,  axe 
so  overtly  theoretical  that  tha  tam  'uttamnoa'  saaaad  definitely  inappro¬ 
priate  for  than,  and  I  had  to  shift  to  tha  tam  'sentenoe'.  Though  these  two 
tarns  in  ordinary  usage,  a a  wall  aa  in  tha  usage  of  aoat  linguists,  are  al- 
aost  synonymous,  I  have  already  suggested  onca  before  [41]  to  distinguish 
artificially  between  thaa  qua  technical  tarns  and  use  'uttamnoa'  for  obser¬ 
vational  entitiaa  and  'sentence'  for  theoretioal  ones  (with  the  adjective 
'possible'  performing  as  a  category-shifting  Modifier,  an  extremely  important 


■3- 


aad  not  fully  analysed  asaantloal  foot). '  That  '  sentence '  is  ordinarily  uaad 
in  both  thasa  aaaaaa,  aa  la  'uprd'  and  nany  othar  tana  of  thia  ana,  la, 
of  oourse,  ona  of  tha  najor  aouroaa  of  oonfualon  and  futila  controversies. 

3a ta  (2)  and  (3)  ten  little  llnguistio  importance.  Baoauaa  of  thalr 
indafiaitanaaa  it  la  diffleult  to  note  intonating  statements  about  than*  Sata 
(4)  and  (5)  —  in  all  rigor  Z  ateuld  ten  spoken  about  tha  fllilfft  of  aata  (4) 
and  (3)  —  arc  by  and  laiya  idaatiaal,  at  laaat  undar  oartain  plauaibla  inter¬ 
pretations  of  'possible1,  tha  otercotoriatio  of  (4)  boing  what  Carnap  [44] 
oallod  "quaai-payohologiatio t a  uhilo  (5)  la  presumably  oharcotariiad  in  an 
overtly  and  punly  ayutaotioal  faahioa. 

In  nany  liagvdatlo  oirolaa,  it  tea  boon  atandard  prooadura  to  aaka  balian 
that  lingula  ta,  in  tteir  profaaaioaal  oapaeity,  arc  doallng  with  aata  of  typo 
(l)  (or  of  typao  (2)  or  (3)).  Thia  flotlon  gan  tteir  endeavor,  ao  ttey  be¬ 
lieved,  a  oioaaaaan  to  aarth,  da  oparational  aolidity  which  ttey  warc  anxious 
not  to  loao.  In  foot*  ttey  allt  with  hardly  aa  axoaptlon,  daalt  with  aata  of 
typos  (4)  or  (5).  dll  tits  talk  shout  "corpora*  was  only  lip-aarrica.  Today 
wa  know  that  no  soloaos  worth  its  aalt  oould  possibly  stick  to  obsamtioa 
axelusiwaly.  Vhoanr  la  out  to  daserlba  and  nothing  alas  will  not  dasoriba 
wall.  Tht2r*tTn  MTlMIte  — t.  Though  X  don't  think  that  it  ia  naoasaary, 
or  ana  halpful,  to  say  that  qjrsrv  dssoriptlon  alrcady  contains  theoretical 
alanants  —  aa  sons  raoant  aethodologiats  arc  fond  of  atraaaing  —  it  auat 
ba  said  that  tbaorophobia  ia  a  disease,  faahionabla  as  it  night  ba.  dll  scien¬ 
tific  sta tenants  asst  surely  ba  oonnsetad  with  observations,  but  thia  oonaao- 
tion  oan,  and  oust,  ba  suoh  aarc  oblique  than  nany  no thodo logical  aiaplioiats 
ba liars. 

Kstuming  fros  thaaa  ganoralitias  to  our  present  problan  of  the  relation 
between  language  and  apoooh  —  with  Mf  horaring  in  tha  book  as  a  kind  of  pror¬ 
ing  ground  —  it  should  bo  superfluous  to  insist  that  tha  proper  business  of 
the  theoretical  linguist  is  to  doaorlba  not  the  actual  llnguistio  JBUi&SMQQM. 
of  note  indiridual  (or  of  so  Sony  individuals)  —  this  "natural  history"  stage 
being  of  Halted  intenet  only  —  but  hie  llnguistio  connatanos  (or  that  of  a 
oartain  oonuunity  of  individuals),  to  uaa  a  diohotosy  that  has  reotntly  bean 


-4- 


auch  stressed  by  Millar  and  Chosaky  [35],  Nov  competence  is  a  disposition,  per¬ 
haps  even  a  higher-order  disposition.  To  ha  a  coapetent  native  speaker  of  Bng- 
liah  aeans  not  Just  to  have  performed  in  the  past  in  a  certain  way,  not  even 
that  ha  will  (in  all  likelihood)  perform  in  a  certain  way  when  presented  with 
certain  stimuli,  but  rather  that  one  would  perform,  or  would  have  performed  (in 
all  likelihood),  in  a  certain  way,  were  he  to  be  presented  (or  had  he  been  pre¬ 
sented)  with  certain  stimuli  —  in  addition  to  many  other  things.  I  know  per¬ 
fectly  well  that  no  competent  English  speaker  will  ever  in  his  life  be  presented 
with  a  certain  utterance  oonsisting  of  a  few  billion  words,  say  of  the  form 
"Kennedy  is  hungry,  and  Krusohev  is  thirsty,  and  De  Qaulle  is  tired,  ...  ,  and 
Adenauer  is  old.",  going  over  the  whole  present  population  of  the  world,  but  I 
know,  and  everybody  else  knows  perfectly  well,  that  were  such  a  speaker,  con¬ 
trary  to  faot,  to  be  presented  with  such  an  utterance,  he  would  understand  it 
as  a  perfect  specimen  of  an  English  sentence. 

There  is  no  mechanical  procedure  to  move  from  someone's  performance  to  his 
competence.  Just  as  there  is  no  mechanical  procedure  to  move  from  any  number 
of  physical  observations  to  a  physical  theory.  Ait  Just  as  this  fact  does  not 
free  the  physicist  from  his  professional  obligation  to  develop  theories,  so 
there  is  nothing  to  absolve  the  linguists  from  presenting  theories  of  linguis¬ 
tic  competence.  Testing  the  validity  of  these  theories  will,  again  as  in  the 
other  theoretical  sciences,  in  general  prooeed  not  in  any  straightforward  way 
but  by  standard  indirect  methods.  That  John  is  competent  to  understand  a  cer¬ 
tain  ten-billion-word  aentenoe  will  not  be  tested  by  presenting  John  with  a 
token  of  this  sentence,  but,  as  we  all  know,  by  entirely  different,  oblique 
methods.  Tor  the  above  sentence,  for  instance,  it  would  suffice  to  find  out 
that  John  understands  such  sentences  as  "Paul  is  hungry."  and  "David  is  thirsty." 
as  well  as  that  he  has  mastered  the  rule  that  whenever  a  and  0  are  sentences, 
a  followed  by  'and'  followed  by  0  is  a  sentence.  This  latter  finding  might 
not  be  a  very  simple  one  or  a  very  secure  one,  but  we  do  often  olaim  to  have 
found  out  Just  such  things. 

One  often  hears,  in  certain  philosophical  circles  as  well  as  among  people 
interested  in  applied  linguistics,  statements  to  the  effect  that  natural  lan- 


guages  have  no  grammar.  Thaae  people  are  aware  of  the  paradoxical  character 
of  such  statements,  but  nevertheless  insist  that  they  are  true,  and  even  tri¬ 
vially  so.  Every  grammar,  so  they  say,  determines  a  certain  fixed,  '‘static,’* 
set  of  sentences.  But  a  natural  language  is  a  living  affair,  "dynamic,’*  con¬ 
stantly  in  change,  and  it  is  utterly  impossible  that  the  set  of  sentences 
should  coincide  with  the  set  of  utterances,  as  it  should  for  an  adequate  gram¬ 
mar.  It  should  now  be  obvious  where  the  fallacy  lies  in  this  arguaent:  in 
the  unthinking  identification  of  sentences  and  utterances,  and  in  the  complete 
misunderstanding  of  the  relation  between  theoxy  and  observation.  It  is  as  if 
one  wanted  to  argue  that  natural  gases  obey  no  physical  lavs,  since  these 
laws  apply  only  to  the  fictitious  "ideal  gases.**  (incidentally,  such  state¬ 
ments  have  indeed  been  aade  by  obscurantists  at  all  tines.)  To  understand  the 
exact  relationship  between  the  laws  of  gases  of  theoretical  physics  and  the 
behavior  of  real  gases  requires  a  lot  of  Methodological  sophistication,  and  no 
less  should  be  expected  for  the  understanding  of  the  exact  relationship  between 
the  graanatical  rules  of  an  artificial  language  and  the  utterances  aade  by  the 
Members  of  the  community  speaking  this  language.  Any  naive  identification  will 
quickly  result  in  paradox,  futile  discussions,  and  irrational  distrust  of  theory. 

That  the  question  of  the  adequacy  of  a  given  grammar  is  much  more  complex 
than  ordinarily  assumed  does  not  mean  that  this  question  is  a  pointless  one. 

On  the  contrary,  since  there  exists  no  siaple  criterion  for  deciding  which  of 
two  propsed  grammars  is  "better,”  more  adequate  than  the  other,  the  problem  of 
finding  jIqx  criterion,  however  partial  and  indirect,  becomes  of  overwhelming 
iaportance.  The  fact  is,  of  course,  that  extremely  little  is  known  here  be¬ 
yond  programmatic  declarations.  Ve  know  that  "graanatical*  should  not  be 
identified  with  " comprehensible, *  nor  is  one  of  these  concepts  subsumed  under 
the  other,  but  neither  are  these  two  conoepts  incommensurable.  In  that  con¬ 
nection  we  have  the  large  eoaplex  of  questions  arising  around  degrees  of 
graanaticalness,  deviancy,  oddness,  and  anomaly;  all  of  vital  iaportance  to 
linguists  and  philosophers  alike.  Some  of  you  know  the  valiant  beginnings 
aade  toward  an  investigation  of  this  problea  by  Chomsky,  Ziff  [43]  and  others, 
but  it  will,  I  hope,  not  deter  you  from  following  in  their  footsteps,  if  I 


-fi¬ 


sts  te,  rather  dogmatically,  that  thaaa  attempts  are  woefully  inadequate,  while 
admitting  that  I  have  nothing  batter  to  offer,  for  the  moment. 

Aa  eoon  ae  it  ia  understood  that  ooapetenee  and  performance  are  to  be  kept 
clearly  apart,  one  will  no  longer  be  tempted  to  feel  oneself  obliged  to  impose 
upon,  say,  the  Inglish  language  a  gnaw ar  which  will  not  allow  the  generation 
of  sentanoee  of  a  higher  degree  of  syntactio  ooaplexity  than  some  small  number, 
say  4,  according  to  one  or  the  other  aeaaurea  diaeusaad  in  the  previous  lec¬ 
ture*  True  enough,  " corresponding"  utterances  are  not  normally  found  in  apeeoh 
or  writing,  and  if  artificially  produced  will  not  be  grasped  unless  certain 
artificial  auxiliary  naans  are  invoked.  These  limitations  of  hunan  performance 
are  doubtless  of  vital  importance;  have  to  be  clearly  stated  and  investigated; 
and  should,  sooner  or  later,  be  backed  up  by  some  neurophysiological  theory. 

They  are  of  equal  importance  for  the  programming  of  machines  which  are  charged 
with  determining  the  syntactic  structure  of  all  sentences  of  any  given  text 
of  a  given  language.  That  sentences  of  a  high  degree  of  complexity  oan  be 
disregarded  for  this  purpose,  beoause  of  their  extreme  rarity  or  just  plain 
non-occurrence,  may  allow  aa  organisation  of  the  computer's  working  apace  that 
could  make  all  the  difference  between  the  economically  feasible  and  the  econ¬ 
omically  utopian.  Kit  in  order  to  do  all  this,  it  is  by  no  means  necessary  to 
impose  these  restrictions  on  the  gramas r  of  Inglish  as  such.  Ko thing  is  gained, 
and  much  is  lost,  lot  only  will  certain  arbitrary- looking  restrictions  on  the 
recursive  generation  rules  have  to  be  Imposed,  thereby  increasing  the  complexity 
of  the  gramas  r  to  a  degree  that  oan  hardly  be  estimated  at  present,  but  this 
procedure  is  self-defeating.  It  is  done  in  the  name  of  "sticking  to  the  brute 
facts,"  but  doing  so  in  such  a  crude  way  will  force  the  adherents  of  this  ap¬ 
proach  to  disregard  other  brute  facts,  such  as  that  with  the  aid  of  certain 
auxiliary  means,  the  syntaotio  structure  of  Inglish  word  sequences  of  a  de¬ 
gree  of  syntactic  ooaplexity  of  5,  or  of  100  for  that  matter,  will  be  perfectly 
grasped.  Since  theee  word  sequenoes  are  not  Inglish  sentences,  according  to 
the  grammarians  of  perforaanoe,  how  ooee  they  are  understood  and  what  is  the 
language  they  belong  to? 

This  does  not  mean,  of  course,  that  restrictions  of  perforaanoe  will  not 


reflect  themselves  in  the  gramnar.  I  am  convinced,  e.go,  that  Professor 
Tngre  has  made  a  remark  full  of  insist  when  he  noticed  and  stressed  the  faot 
that  by  changing  its  mood  from  the  active  to  the  passive,  the  syntactic  com¬ 
plexity  of  a  given  sentenee  can  be  reduced.  And  I  have  no  objection  to  formu¬ 
lating  this  insight  in  the  form  that  there  exists  a  passive  in  English  (and  the 
same  or  other  devices  in  other  languages)  in  order  to  allow,  among  other  things, 
the  formulation  of  certain  thoughts  in  sentences  of  a  lower  degree  of  complexity 
than  would  otherwise  have  been  possible.  But  trying  to  obliterate  the  distinc¬ 
tion  between  competence  and  performance,  to  say  it  for  the  last  time,  is  only 
a  sign  of  confusion  and  will  breed  further  confusion.  The  sooner  we  get  rid 
of  these  last  traces  of  extreme  operationalise,  the  better  for  all  of  us,  includ¬ 
ing  HT  research  workers. 

In  order  to  describe  and  explain  the  facts  of  speech  exhaustively  and 
xevealingly,  a  full-fledged,  formal  theory  of  language  is  needed,  among  many 
other  things.  Philosophical  prejudice  aside,  there  is  no  particular  merit  in 
keeping  this  theory  "close  to  the  facts,"  in  assuming  that  the  rules  of  corres¬ 
pondence  which  connect  the  theory  (in  the  narrower  sense  of  the  word)  with  ob¬ 
servation  will  have  a  particularly  simple  fora.  Experience  from  other  soienoes 
should  have  taught  us  that  such  an  assumption  is  baseless.  Physics,  e.g. ,  has 
reached  its  present  heights  only  because  the  free  flight  of  fancy,  "the  free 
play  of  ideas,"  has  not  been  fettered  by  a  narrow  conception  of  scientific 
methodology.  True  enough,  the  particular  logical  status  of  these  rules  of  cor¬ 
respondence  has  still  not  been  deeply  enough  investigated,  and  I  fully  under¬ 
stand  the  attitude  of  those  who,  for  this  reason,  regard  this  whole  business 
with  suspicion,  and  are  afraid  that  the  free  flight  of  fancy  will  reintroduoe 
uncontrollable  metaphysics  into  science  in  general  and  linguistics  in  parti¬ 
cular.  But  I  hope  that  the  necessary  controls  will  be  developed  and  better 
understood  in  the  future  and  that  in  the  meantime  one  will  manage  somehow. 
Occasional  metaphysical  aberrations  are  probably  less  damaging  in  the  long 
run  than  the  curtailment  of  creative  scientific  imagination. 

Let  me  stress,  in  this  connection,  that  the  extensive  use  of  symbolism 
in  the  formulation  of  generative  grammars  has  induced  many  linguists  to  accuse 


the  authors  of  thsss  formulations  of  having  lost  all  oonnsction  with  **- 
pirioal  soisnos  and  indulged  instead  in  sob*  mathematical  surrogate.  I  hops 
that  it  is  now  perfectly  olear  that  this  accusation  is  baseless.  A  formal 
grammar  of  Inglish  is  an  — theory  of  the  Inglish  language,  and  its 
symbolic  formulation,  while  it  increases  its  precision  and  therefore  its 
testability,  by  no  means  turns  it  into  a  mathematical  theory.  When  according 
to  a  certain  gnumur  "Sincerity  admires  John.”  turns  out  not  to  be  a  (formal) 
sentence  whereas  this  very  sequence  is  considered  by  someone  to  be  an  (intui¬ 
tive)  sentence,  then  this  grammar  is  to  that  degree  inadequate  to  his  intui¬ 
tions.  It  should  only  be  kept  in  mind  that  the  determination  of  the  intuitive 
sentenoehood  of  "Sincerity  admires  John."  is  by  no  means  such  a  straightforward 
affair  of  observation,  experimentation  and  statistics  as  sons  people  believe. 
The  notion  of  "intuitive  sentsnoe"  is  highly  theoretical  itself  (though  with¬ 
out  the  benefit  of  a  complete  theory  being  formulated  to  back  it  up,  which 
fact  is,  of  course,  the  whole  crux  of  this  peculiar  modifier  'intuitive')*  and 
observations  on  utterances  of  people  or  their  reaction  to  utteranoes  alone 
will  never  settle  in  any  clearest  way  the  question  of  the  sentenoehood  of  a 
particular  word  sequence.  This  is  as  it  should  be,  and  only  wishful  think¬ 
ing  and  naive  methodology  sake  people  believe  otherwise.  Confirmation  and 
refutation  of  linguistic  theories,  as  of  theories  in  any  other  scienee,  is 
not  such  a  simple  operation  as  one  is  taught  to  believe  in  high  school.  But 
the  complexity  of  refutation  does  not  make  a  linguistic  theory  empirically 
Irrefutable  and  therefore  doss  not  turn  it  into  a  mathematical  theory. 


FOURTH  LECTURE:  WHY  MACHINES  WON:T  LEARN  TO  TRANSLATE  WELL 


My  arguments  against  the  feasibility  of  high-quality  ful ly-autosa tic 
translation  can  be  assumed  to  be  well  known  in  this  audience.  I  have  gone 
through  them  often  enough  in  lectures  and  publications.  I  also  hare  the  im¬ 
pression  that,  after  occasionally  rather  strong  initial  negative  reactions,  a 
good  number  of  people  who  have  been  active  in  the  field  of  MT  for  some  years 
tend  more  and  more  to  agree  with  these  arguments,  though  they  might  prefer  a 
more  restrained  formulation.  On  the  other  hand,  the  number  of  research  groups 
which  have  taken  up  MT  as  their  major  field  of  activity  is  still  or  the  in- 
orease,  and  by  now  there  is  hardly  a  country  left  in  Europe  and  North  America 
which  does  not  feature  at  least  one  such  group,  with  Japan,  China,  India  and 
a  couple  of  South  Amerioan  countries  joining  them,  for  good  measure.  Though 
a  certain  amount  of  involvement  in  MT,  and  in  particular  in  its  theoretical 
aspects,  is  certainly  helpful  and  apt  to  yield  fresh  insights  into  the  work¬ 
ings  of  language,  most  of  the  work  that  is  at  present  going  on  under  the  auspi¬ 
ces  of  MT  seems  to  me  to  be  a  wanton  expenditure  of  research  money  that  oould 
he  put  to  better  use  in  other  fields  and,  still  worse,  a  deplorable  waste  of 
research  potential. 

The  combined  interest  in  MT  is  sometimes  defended  on  the  grounds  that 
though  it  is  indeed  extremely  unlikely  that  computers  working  according  to  rigid 
algorithms  will  ever  produce  high-quality  translations,  there  still  exists  a 
possibility  that  computers  with  considerable  learning  ("self-organising”) 
abilities  will  be  able  through  training  and  experience  to  improve  their  initial 
algorithms  and  thereby  constantly  improve  their  output  until  adequate  quality 
is  achieved.  I  myself  mentioned  the  possibility  in  some  prior  publications 
but  refrained  from  evaluating  it,  at  that  tine  regarding  such  an  evaluation 
as  premature  [l5,45]. 

During  the  last  two  years,  however,  while  going  through  the  pertinent 
literature  once  more  and  pondering  over  the  whole  issue  of  artificial  intel¬ 
ligence,  I  came  to  more  radical  conclusions  which  I  would  like  to  expose  and 
defend  here.  Today,  I  am  convinced  that  even  machines  with  learning  abilities, 
as  we  know  them  today  or  foresee  them  according  to  known  principles,  will 
not  be  able  to  improve  by  much  the  quality  of  the  translation  output. 


-2- 


toT  this  purpose,  1st  us  notice  once  sore  the  obvious  prerequisites 
for  high-quality  human  translation.  The:*  are  at  least  the  following  five 
of  theav  though  deeper  analysis  would  doubtless  reveal  aoret 

(1)  competent  mastery  of  the  souroe  language, 

(2)  competent  mastery  of  the  target  language, 

(3)  good  general  background  knowledge, 

(4)  expertness  in  the  field, 
and  (5)  intelligence  (know-how). 

(i  admit,  of  course,  that  the  last  of  these  prerequisites,  intelligence,  is 
not  too  well  defined  or  understood,  and  shall  therefore  have  to  use  it  with 
a  good  amount  of  caution.) 

All  this  was  surely  common  knowledge  at  all  times,  and  certainly  known 
to  all  of  us  "machine  translations  pioneers*  a  dosen  years  ago.  I  knew  then 
that  nothing  corresponding  to  items  (3)  and  (4)  could  be  expeoted  of  elec¬ 
tronic  computers,  but  thought  that  (l)  and  (2)  should  be  within  their  reach, 
and  entertained  some  hopes  that  by  exploiting  the  redundancy  of  natural  lan¬ 
guage  texts  better  than  human  readers  usually  do,  we  should  perhaps  be  in  a 
position  to  enable  the  computers  to  overcome,  at  least  partly,  their  lack  of 
knowledge  and  understanding.  True  enough,  scientists  , (and  almost  everyone 
else)  write  their  articles  with  a  reader  in  mind  who,  in  addition  to  having 
a  good  command  of  the  language,  has  a  general  background  knowledge  of,  say, 
oollege  level,  has  so  many  years  of  study  behind  him  in  the  respective  field, 
and  is  intelligent  enough  to  know  how  to  apply  these  three  factors  when 
called  upon  to  do  so.  But  it  could  have  been,  oouldn't  it,  that,  perhaps 
inadvertently,  they  do  introduce  sufficient  formal  clues  in  their  publications 
to  enable  a  very  ingenious  team  of  linguists  and  programmers  to  write  a 
translation  program  whose  output,  though  produced  by  the  machine  without 
understanding,  would  be  indistinguishable  from  a  translation  done  out  of 
understanding?  After  all,  cases  are  known  of  human  translations  that  were 
done  under  similar  conditions  and  were  not  always  recognised  as  such. 

Veil,  it  could  have  been  so,  but  it  just  didn't  turn  out  this  way.  Tor 
any  given  source  language,  there  are  countless  sentences  to  which  a  competent 


and  not  fully  analysed  aaaantioal  fact).  That  'sentence'  is  ordinarily  used 
in  both  these  senses,  as  is  'word*  and  aany  other  tens  of  this  area,  is, 
of  ooune,  one  of  the  major  sources  of  confusion  and  futile  controversies. 

Sets  (2)  and  (3)  have  little  linguistie  importance.  Because  of  their 
indefiniteness  it  is  difficult  to  make  interesting  statements  about  them.  Sots 
(4)  and  (5)  —  in  all  rigor  I  should  have  spoken  about  the  classes  of  sets  (4) 
and  (5)  —  are  by  and  large  identical,  at  least  under  certain  plausible  inter¬ 
pretations  of  'possible',  the  obanoteristic  of  (4)  being  shat  Carnap  [44] 
called  "quaai-payahologistio,"  while  (3)  is  presumably  characterised  in  an 
overtly  and  purely  syntactical  fashion* 

Zn  many  linguistic  ol roles,  it  has  been  standard  procedure  to  make  believe 
that  linguists,  in  their  professional  capacity,  are  dealing  with  sets  of  type 
(l)  (or  of  types  (2)  or  (3)).  This  fiction  gave  their  endeavor,  so  they  be¬ 
lieved,  a  closeness- to-sarth,  4n  operational  solidity  which  they  were  anxious 
not  to  lose.  In  feet,  they  all,  with  hardly  an  exoeptlon,  dealt  with  sets  of 
types  (4)  or  (5) .  All  the  talk  about  "oorpora"  was  only  lip-service.  Today 
we  know  that  no  solsnoe  worth  its  salt  could  possibly  stick  to  observation 
exclusively.  Whoever  is  out  to  describe  and  nothing  else  will  not  describe 
well.  Theorisare  nsossae  set.  Though  I  don't  think  that  it  is  neoessary, 
or  even  helpful,  to  say  that  eyerv  description  already  contains  theoretical 
elements  —  as  some  reoent  methodologists  are  fond  of  stressing  —  it  must 
be  said  that  theoro phobia  is  a  disease,  fashionable  as  it  sight  be.  All  scien¬ 
tific  statements  must  surely  be  connected  with  observations,  but  this  connec¬ 
tion  can,  and  must,  be  much  more  oblique  than  many  methodological  aimplicista 
believe. 

Returning  from  these  generalities  to  our  present  problem  of  the  relation 
between  language  and  speeoh  —  with  KT  hovering  in  the  book  as  a  kind  of  prov¬ 
ing  ground  —  it  should  be  superfluous  to  insist  that  the  proper  business  of 
the  theoretical  linguist  is  to  describe  not  the  actual  linguistic  Mrfnm»nn» 
of  some  individual  (or  of  sc  aany  individuals)  —  this  "natural  history"  stage 
being  of  limited  interest  only  —  but  his  linguistie  competence  (or  that  of  a 
certain  oowunity  of  individuals),  to  use  a  dichotomy  that  has  recently  been 


-4- 


much  stressed  by  Miller  and  Chomsky  [35],  Now  competence  is  a  disposition,  per¬ 
haps  even  a  higher-order  disposition.  To  be  a  competent  native  speaker  of  Eng¬ 
lish  means  not  just  to  have  performed  in  the  past  in  a  certain  way,  not  even 
that  he  will  (in  all  likelihood)  perform  in  a  certain  way  when  presented  with 
certain  stimuli,  but  rather  that  one  would  perform,  or  would  have  performed  (in 
all  likelihood),  in  a  certain  way,  were  he  to  be  presented  (or  had  he  been  pre¬ 
sented)  with  certain  stimuli  —  in  addition  to  many  other  things.  I  know  per¬ 
fectly  well  that  no  competent  English  speaker  will  ever  in  his  life  be  presented 
with  a  certain  utterance  consisting  of  a  few  billion  words,  say  of  the  form 
"Kennedy  is  hungry,  and  Kruschev  is  thirsty,  and  Be  Gaulle  is  tired,  ...  ,  and 
Adenauer  is  old.”,  going  over  the  whole  present  population  of  the  world,  but  I 
know,  and  everybody  else  knows  perfectly  well,  that  were  such  a  speaker,  con¬ 
trary  to  fact,  to  be  presented  with  such  an  utterance,  he  would  understand  it 
as  a  perfect  specimen  of  an  English  sentence. 

There  is  no  mechanical  procedure  to  move  from  someone's  performance  to  his 
competence,  just  as  there  is  no  mechanical  procedure  to  move  from  any  number 
of  physical  observations  to  a  physical  theory.  But  just  as  this  fact  does  not 
free  the  physicist  from  his  professional  obligation  to  develop  theories,  so 
there  is  nothing  to  absolve  the  linguists  from  presenting  theories  of  linguis¬ 
tic  competence.  Testing  the  validity  of  these  theories  will,  again  as  in  the 
other  theoretical  sciences,  in  general  proceed  not  in  any  straightforward  way 
but  by  standard  indirect  methods.  That  John  is  competent  to  understand  a  cer¬ 
tain  ten-billion-word  sentence  will  not  be  tested  by  presenting  John  with  a 
token  of  this  sentence,  but,  as  we  all  know,  by  entirely  different,  oblique 
methods.  For  the  above  sentence,  for  instance,  it  would  suffice  to  find  out 
that  John  understands  such  sentences  as  "Paul  is  hungry."  and  "David  is  thirsty." 
as  well  as  that  he  has  mastered  the  rule  that  whenever  a  and  p  are  sentences, 
a  followed  by  'and'  followed  by  0  is  a  sentence.  This  latter  finding  might 
not  be  a  very  simple  one  or  a  very  secure  one,  but  we  do  often  claim  to  have 
found  out  just  such  things. 

One  often  hears,  in  certain  philosophical  circles  as  well  as  among  people 
interested  in  applied  linguistics,  statements  to  the  effect  that  natural  lan- 


-5- 


guages  have  no  grammar.  Theae  people  are  aware  of  the  paradoxical  character 
of  such  statements,  but  nevertheless  insist  that  they  are  true,  and  even  tri¬ 
vially  so.  iivery  grammar,  so  they  say,  determines  a  certain  fixed,  "static,™ 
set  of  sentences.  But  a  natural  language  is  a  living  affair,  "dynamic,"  con¬ 
stantly  in  change,  and  it  is  utterly  impossible  that  the  set  of  sentences 
should  coincide  with  the  set  of  utterances,  as  it  should  for  an  adequate  gram¬ 
mar.  It  should  now  be  obvious  where  the  fallacy  lies  in  this  argument:  in 
the  unthinking  identification  of  sentences  and  utterances,  and  in  the  complete 
misunderstanding  of  the  relation  between  theory  and  observation.  It  is  as  if 
one  wanted  to  argue  that  natural  gases  obey  no  physical  laws,  since  these 
laws  apply  only  to  the  fictitious  "ideal  gases.™  (incidentally,  such  state¬ 
ments  have  indeed  been  made  by  obscurantists  at  all  times.)  To  understand  the 
exact  relationship  between  the  laws  of  gases  of  theoretical  physics  and  the 
behavior  of  real  gases  requires  a  lot  of  methodological  sophistication,  and  no 
less  should  be  expected  for  the  understanding  of  the  exact  relationship  between 
the  grammatical  rules  of  an  artificial  language  and  the  utterances  made  by  the 
members  of  the  community  speaking  this  language.  Any  naive  identification  will 
quickly  result  in  paradox,  futile  discussions,  and  irrational  distrust  of  theory. 

That  the  question  of  the  adequacy  of  a  given  grammar  is  much  more  complex 
than  ordinarily  assumed  does  not  mean  that  this  question  is  a  pointless  one. 

On  the  contrary,  since  there  exists  no  simple  criterion  for  deciding  which  of 
two  propsed  grammars  is  "better,”  more  adequate  than  the  other,  the  problem  of 
finding  any  criterion,  however  partial  and  indirect,  becomes  of  overwhelming 
importance.  The  fact  is,  of  course,  that  extremely  little  is  known  here  be¬ 
yond  programmatic  declarations.  We  know  that  "grammatical"  should  not  be 
identified  with  "comprehensible,”  nor  is  one  of  these  concepts  subsumed  under 
the  other,  but  neither  are  these  two  concepts  incoasensurable .  In  that  con¬ 
nection  we  have  the  large  complex  of  questions  arising  around  degrees  of 
grammaticalness,  deviancy,  oddness,  and  anomaly;  all  of  vital  importance  to 
linguists  and  philosophers  alike.  Some  of  you  know  the  valiant  beginnings 
made  toward  an  investigation  of  this  problem  by  Chomsky,  Ziff  [43]  and  others, 
but  it  will,  I  hope,  not  deter  you  from  following  in  their  footsteps,  if  I 


-6- 


state,  rather  dogmatically,  that  these  attempts  are  woefully  inadequate,  while 
admitting  that  1  hare  nothing  better  to  offer,  for  the  moment. 

As  soon  as  it  is  understood  that  oompetenoe  and  performance  are  to  be  kept 
clearly  apart,  one  will  no  longer  be  tempted  to  feel  oneself  obliged  to  impose 
upon,  say,  the  Bnglish  language  a  grammar  which  will  not  allow  the  generation 
of  sentences  of  a  higher  degree  of  syntactic  complexity  than  some  small  number, 
say  4,  according  to  one  or  the  other  measures  discussed  in  the  previous  lec¬ 
ture.  True  enough,  "corresponding"  utterances  are  not  normally  found  in  speech 
or  writing,  and  if  artificially  produced  will  not  be  grasped  unless  certain 
artificial  auxiliary  means  are  invoked.  These  limitations  of  human  performance 
are  doubtless  of  vital  importance;  have  to  be  clearly  stated  and  investigated; 
and  should,  sooner  or  later,  be  backed  up  by  some  neurophysiological  theory. 

They  are  of  equal  importance  for  the  programming  of  machines  which  are  charged 
with  determining  the  syntactio  structure  of  all  sentences  of  any  given  text 
of  a  given  language.  That  sentences  of  a  high  degree  of  complexity  can  be 
disregarded  for  this  purpose,  because  of  their  extreme  rarity  or  Just  plain 
non-occurrence,  nay  allow  an  organisation  of  the  computer's  working  space  that 
could  make  all  the  difference  between  the  economically  feasible  and  the  econ¬ 
omically  utopian.  But  in  order  to  do  all  this,  it  is  by  no  means  necessary  to 
impose  these  restrictions  on  the  grammar  of  Bnglish  as  such.  Bo thing  is  gained, 
and  much  is  lost.  Bot  only  will  certain  arbitrary-looking  restrictions  on  the 
recursive  generation  rules  have  to  bo  imposed,  thereby  increasing  the  complexity 
of  the  grammar  to  a  degree  that  can  hardly  be  estimated  at  present,  but  this 
procedure  is  self-defeating.  It  is  done  in  the  name  of  "sticking  to  the  brute 
facts,”  but  doing  so  in  such  a  crude  way  will  force  the  adherents  of  this  ap¬ 
proach  to  disregard  other  brute  facts,  such  as  that  with  the  aid  of  certain 
auxiliary  means,  the  syntactic  structure  of  Bnglish  word  sequences  of  a  de¬ 
gree  of  syntactic  complexity  of  5,  or  of  100  for  that  matter,  will  be  perfectly 
grasped.  Since  these  word  sequences  are  not  Bnglish  sentences,  according  to 
the  grammarians  of  performance,  how  cose  they  are  understood  and  what  is  the 
language  they  belong  to? 

This  does  not  mean,  of  course,  that  restrictions  of  performance  will  not 


-7- 


reflect  themselves  in  the  grammar,  I  am  convinced,  e.g.,  that  Professor 
Yngve  has  made  a  remark  full  of  insight  when  he  noticed  and  stressed  the  fact 
that  by  changing  its  mood  from  the  active  to  the  passive,  the  syntactic  com¬ 
plexity  of  a  given  sentence  can  be  reduced.  And  I  have  no  objection  to  formu¬ 
lating  this  insight  in  the  form  that  there  exists  a  passive  in  English  (and  the 
same  or  other  devices  in  other  languages)  in  order  to  allow,  among  other  things, 
the  formulation  of  certain  thoughts  in  sentences  of  a  lower  degree  of  complexity 
than  would  otherwise  have  been  possible.  But  trying  to  obliterate  the  distinc¬ 
tion  between  competence  and  performance,  to  say  it  for  the  last  time,  is  only 
a  sign  of  confusion  and  will  breed  further  confusion.  The  sooner  we  get  rid 
of  these  last  traces  of  extreme  operationalism,  the  better  for  all  of  us,  includ¬ 
ing  MT  research  workers. 

In  order  to  describe  and  explain  the  facts  of  speech  exhaustively  and 
revealingly,  a  full-fledged,  formal  theory  of  language  is  needed,  among  many 
other  things.  Philosophical  prejudice  aside,  there  is  no  particular  merit  in 
keeping  this  theory  "close  to  the  facts,”  in  assuming  that  the  rules  of  corres¬ 
pondence  which  connect  the  theory  (in  the  narrower  sense  of  the  word)  with  ob¬ 
servation  will  have  a  particularly  simple  form.  Experience  from  other  sciences 
should  have  taught  us  that  such  an  assumption  is  baseless.  Physics,  e.g. ,  has 
reached  its  present  heights  only  because  the  free  flight  of  fancy,  "the  free 
play  of  ideas,”  has  not  been  fettered  by  a  narrow  conception  of  scientific 
methodology.  True  enough,  the  particular  logical  status  of  these  rules  of  cor¬ 
respondence  has  still  not  been  deeply  enough  investigated,  and  I  fully  under¬ 
stand  the  attitude  of  those  who,  for  this  reason,  regard  this  whole  business 
with  suspicion,  and  are  afraid  that  the  free  flight  of  fancy  will  reintroduce 
uncontrollable  metaphysics  into  science  in  general  and  linguistics  in  parti¬ 
cular.  But  1  hope  that  the  necessary  controls  will  be  developed  and  better 
understood  in  the  future  and  that  in  the  meantime  one  will  manage  somehow. 
Occasional  metaphysical  aberrations  are  probably  less  damaging  in  the  long 
run  than  the  curtailment  of  creative  scientific  imagination. 

Let  me  stress,  in  this  connection,  that  the  extensive  use  of  symbolism 
in  the  formulation  of  generative  grammars  has  induced  many  linguists  to  accuse 


-8- 


the  authors  of  these  formulations  of  having  lost  all  connection  with  e»~ 
pirical  science  and  indulged  instead  in  some  mathematical  surrogate.  I  hope 
that  it  is  now  perfectly  olear  that  this  accusation  is  baseless.  A  formal 
grammar  of  English  is  an  empirical  theory  of  the  English  language,  and  its 
symbolic  formulation,  while  it  increases  its  precision  and  therefore  its 
testability,  by  no  means  turns  it  into  a  mathematical  theory.  When  according 
to  a  certain  gnumur  "Sincerity  admires  John."  turns  out  not  to  be  a  (formal) 
sentence  whereas  this  very  sequence  is  considered  by  someone  to  be  an  (intui¬ 
tive)  sentence,  then  this  graMar  is  to  that  degree  inadequate  to  his  intui¬ 
tions.  It  should  only  be  kept  in  mind  that  the  determination  of  the  intuitive 
sentencehood  of  "Sincerity  admires  John."  is  by  no  means  such  a  straightforward 
affair  of  observation,  experimentation  and  statistics  as  some  people  believe. 
The  notion  of  "intuitive  aentenoe”  is  highly  theoretical  itself  (though  with¬ 
out  the  benefit  of  a  complete  theory  being  formulated  to  back  it  up,  which 
fact  is,  of  course,  the  whole  crux  of  this  peculiar  modifier  'intuitive'),  and 
observations  on  utterances  of  people  or  their  reaction  to  utterances  alone 
will  never  settle  in  any  clearcut  way  the  question  of  the  sentencehood  of  a 
particular  word  sequence.  This  is  as  it  should  be,  and  only  wishful  think¬ 
ing  and  naive  methodology  make  people  believe  otherwise.  Confirmation  and 
refutation  of  linguistic  theories,  as  of  theories  in  any  other  science,  is 
not  such  a  simple  operation  as  one  is  taught  to  believe  in  high  school.  But 
the  complexity  of  refutation  does  not  make  a  linguistic  theory  empirically 
irrefutable  and  therefore  does  not  turn  it  into  a  mathematical  theory. 


FOURTH  LECTURES  WHY  MACHINES  WON  T  LEARN  TO  TRANSLATE  WELL 


My  arguments  against  the  feasibility  of  high-quality  fully-automatic 
translation  can  be  assumed  to  be  well  known  in  this  audience.  I  have  gone 
through  them  often  enough  in  lectures  and  publications.  I  also  have  the  im¬ 
pression  that,  after  occasionally  rather  strong  initial  negative  reactions,  a 
good  number  of  people  who  have  been  active  in  the  field  of  MT  for  some  years 
tend  more  and  more  to  agree  with  these  arguments,  though  they  might  prefer  a 
more  restrained  formulation.  On  the  other  hand,  the  number  of  research  groups 
which  have  taken  up  MT  as  their  major  field  of  activity  is  still  on  the  in¬ 
crease,  and  by  now  there  is  hardly  a  country  left  in  Europe  and  North  America 
which  does  not  feature  at  least  one  such  group,  with  Japan,  China,  India  and 
a  couple  of  South  American  countries  joining  then,  for  good  measure.  Though 
a  certain  amount  of  involvement  in  KT,  and  in  particular  in  its  theoretical 
aspects,  is  certainly  helpful  and  apt  to  yield  fresh  insights  into  the  work¬ 
ings  of  language,  most  of  the  work  that  is  at  present  going  on  under  the  auspi¬ 
ces  of  MT  seems  to  me  to  be  a  wanton  expenditure  of  research  money  that  could 
be  put  to  better  use  in  other  fields  and,  still  worse,  a  deplorable  waste  of 
research  potential. 

The  combined  interest  in  MT  is  sometimes  defended  on  the  grounds  that 
though  it  is  indeed  extremely  unlikely  that  computers  working  according  to  rigid 
algorithms  will  ever  produce  high-quality  translations,  there  still  exists  a 
possibility  that  computers  with  considerable  learning  ("self-organising") 
abilities  will  be  able  through  training  and  experience  to  improve  their  initial 
algorithms  and  thereby  constantly  improve  their  output  until  adequate  quality 
is  achieved.  I  myself  mentioned  the  possibility  in  some  prior  publications 
but  refrained  from  evaluating  it,  at  that  time  regarding  such  an  evaluation 
as  premature  [l5,45]. 

During  the  last  two  years,  however,  while  going  through  the  pertinent 
literature  once  more  and  pondering  over  the  whole  issue  of  artificial  intel¬ 
ligence,  I  came  to  more  radical  conclusions  which  I  would  like  to  expose  and 
defend  here.  Today,  I  am  convinced  that  even  machines  with  learning  abilities, 
as  we  know  them  today  or  foresee  them  according  to  known  principles,  will 
not  be  able  to  improve  by  much  the  quality  of  the  translation  output. 


-2- 


For  this  purpose,  1st  us  notice  once  more  the  obvious  prerequisites 
for  high-quality  human  translation.  The . «  a;rs  at  least  the  following  five 
of  them,  though  deeper  analysis  would  doubtless  reveal  Bore: 

(1)  competent  mastery  of  the  source  language, 

(2)  competent  mastery  of  the  target  language, 

(3)  good  general  background  knowledge, 

(4)  expertness  in  the  field, 
and  (5)  intelligence  (know-how). 

(i  admit,  of  course,  that  the  last  of  these  prerequisites,  intelligence,  is 
not  too  well  defined  or  understood,  and  shall  therefore  have  to  uqe  it  with 
a  good  amount  of  caution.) 

All  this  was  surely  common  knowledge  at  all  times,  and  certainly  known 
to  all  of  us  "machine  translations  pioneers"  a  dozen  years  ago.  I  knew  then 
that  nothing  corresponding  to  items  (3)  and  (4)  could  be  expected  of  elec¬ 
tronic  computers,  but  thought  that  (l)  and  (2)  should  be  within  their  reach, 
and  entertained  some  hopes  that  by  exploiting  the  redundancy  of  natural  lan¬ 
guage  texts  better  than  human  readers  usually  do,  we  should  perhaps  be  in  a 
position  to  enable  the  computers  to  overcome,  at  least  partly,  their  lack  of 
knowledge  and  understanding.  True  enough,  scientists  , (and  almost  everyone 
else)  write  their  articles  with  a  reader  in  mind  who,  in  addition  to  having 
a  good  command  of  the  language,  has  a  general  background  knowledge  of,  say, 
college  level,  has  so  many  years  of  study  behind  him  in  the  respective  field, 
and  is  intelligent  enough  to  know  how  to  apply  these  three  factors  when 
called  upon  to  do  so.  But  it  could  have  been,  couldn't  it,  that,  perhaps 
inadvertently,  they  do  introduce  sufficient  formal  clues  in  their  publications 
to  enable  a  very  ingenious  team  of  linguists  and  prograaMrs  to  write  a 
translation  program  whose  output,  though  produced  by  the  machine  without 
understanding,  would  be  indistinguishable  from  a  translation  done  out  of 
understanding?  After  all,  cases  are  known  of  human  translations  that  were 
done  under  similar  conditions  and  were  not  always  recognized  as  such. 

Well,  it  could  have  been  so,  but  it  just  didn't  turn  out  this  way.  For 
any  given  source  language,  there  are  countless  sentences  to  which  a  competent 


-3- 


huaan  translator  will  prorids  in  a  given  target  language  many,  sometimes  very 
many,  distinct  renderings  which  will  soaetiaes  differ  froa  each  other  only  by 
ainor  idiosyncrasies,  but  will  at  other  times  be  toto  coelo  different.  The 
original  sentence  will  very  often  be,  as  the  standard  expression  goes,  multi- 
ply  aabiguous  by  itself,  aorpho logically,  syntactically,  and  seaantically,  but 
the  coapetent  huaan  translator  will  render  it,  in  its  particular  context,  uni¬ 
quely  to  the  general  satisfaction  of  the  huaan  reader.  The  translator  will 
resolve  these  aabiguities  out  of  the  last  three  factors  aentioned.  Though  it 
is  undoubtedly  the  case  that  soae  reduction  of  aabiguity  can  be  obtained 
through  better  attention  to  certain  foraal  clues,  and  though  it  has  turned 
out  aany  tiaes  that  what  superficial  thinking  regarded  as  definitely  requir¬ 
ing  understanding  could  be  handled  through  certain  refinements  of  purely  for- 
aal  aethods,  it  should  by  now  be  perfectly  clear  that  there  are  liaits  to  what 
these  refinements  can  achieve,  limits  that  definitely  block  the  way  to  autono¬ 
mous,  high-quality,  machine  translation. 

Could  not  perhaps  computers  with  learning  capacity  do  the  job?  Let  ae 
say  rather  dogmatically  that  a  close  study  of  one  of  the  most  publicised  scheaes 
for  the  mechanization  of  problem  solving  and  a  somewhat  less  detailed  study  of 
the  whole  field  of  Artificial  Intelligence,  has  shown  an  aaount  of  careless  and 
irresponsible  talk  which  is  nothing  short  of  appalling  and  soaetiaes  close  to 
lunatic.  There  is  absolutly  nothing  in  all  this  talk  which  shows  any  promise 
to  be  of  real  help  in  mechanizing  translation.  There  is  nothing  to  indicate 
how  computers  could  acquire  what  the  famous  Swiss  linguist  de  Sa us sure  called, 
at  the  beginning  of  this  century,  the  faculte  de  lanaaae.  an  ability  which  is 
today  innate  in  every  huaan  being,  but  which  took  evolution  hundreds  of  Bil¬ 
lions  of  years  to  develop.  Let  nobody  be  deceived  by  the  term  •machine  lan¬ 
guage"  which  aay  be  suggestive  for  other  purposes  but  which  has  turned  out  to 
be  detrimental  in  the  present  context.  Surely  coaputers  can  manipulate  sym¬ 
bols  if  given  the  proper  instructions  and  they  do  it  splendidly,  many  times 
quicker  and  safer  than  huaans,  but  the  distance  from  symbol  manipulation  to 
linguistic  understanding  is  enormous,  and  loose  talk  will  not  diminish  it. 

Though  certain  electronic  devices  (such  as  perceptions)  have  been  built 


-4- 


which  can  be  "trained"  to  perform  certain  tasks  (such  as  pattern  recogni¬ 
tion)  and  indeed  perform  better  after  training  than  before,  and  though  compu¬ 
ters  hare  been  programmed  to  do  certain  things  (such  as  playing  checkers)  and 
do  these  things  better  after  a  period  of  learning  than  before,  it  would  be 
disastrous  to  extrapolate  from  these  primitive  exhibitions  of  artificial  intel¬ 
ligence  to  something  like  translation.  There  just  is  no  serious  basis  for 
such  extrapolation.  As  to  checkers,  the  definition  of  "legal  move"  is  extreme¬ 
ly  simple  and  is,  of  course,  given  the  computer  in  full.  After  a  few  years 
of  work  the  inventor  of  the  checker  playing  program  [46]  succeeded  in  forma¬ 
lizing  a  good  set  of  strategies  so  that  the  training  had  nothing  more  to 
achieve  than  to  introduce  certain  changes  in  the  rank-ordering  of  these  stra¬ 
tegies.  There  never  was  any  question  of  training  the  computer  to  discover  the 
rules  of  checkers,  or  to  expand  an  incomplete  set  of  rules  into  a  complete 
one,  or  to  add  new  strategies  to  those  given  it  beforehand.  But  some  people 
do  talk  about  letting  computers  discover  rules  of  grammar  or  expand  an  in¬ 
complete  set  of  such  rules  fed  into  it,  by  going  over  large  texts  and  using 
"induction."  But  let  me  repeat,  this  talk  is  quite  irresponsible  and  "induc¬ 
tion  "  is  nothing  but  a  magic  word  in  this  connection.  All  attempts  at  for¬ 
malizing  what  they  believe  to  be  inductive  inference  have  completely  failed, 
and  inductive  inference  machines  are  pipe  dreams  even  more  than  autonomous 
translation  machines. 

Now  children  do  learn,  as  we  all  know,  their  native  language  up  to  an 
almost  complete  mastery  of  its  grammar  by  the  time  they  are  four  or  five  years 
old.  But  by  the  time  th^r reach  this  age,  they  have  heard  (and  spoken)  surely 
no  more  than  a  few  hundred  thousand  utterances  in  their  native  language  (only 
a  part  of  which  are  good  textbook  specimens  of  grammatical  sentences).  If 
they  succeeded  in  mastering  the  grammar,  apparently  "by  induction"  from  these 
utterances,  why  shouldn't  a  computer  be  able  to  do  so?  Bven  if  we  add  the 
fact  that  these  children  were  also  told  that  so  many  word  sequences  were  not 
grammatical  sentences  —  whatever  the  fora  was  by  which  they  were  given  these 
pieces  of  instruction  could  not  the  same  procedure  be  mirrored  for  compute rs(? 
Well,  the  answer  to  these  two  questions  can  be  nothing  but  an  uncompromising 
No.  The  childrenare  able  to  perform  as  splendidly  as  they  do  because,  in 


addition  to  the  training  and  learning,  their  brain  is  not  a  tabula  rasa  general 
purpose  computer  but  a  computer  which,  after  all  those  hundreds  of  thousands 

of  years  of  evolution  mentioned  before,  is  also  special  purpose  structured  in 

/ 

such  a  way  that  it  possesses  the  unique  faculte  de  lanxrage  which  makes  it  so 
different  from  the  brain  of  mica,  monkeys,  and  machines.  The  fact  that  we 
know  close  to  nothing  about  this  structure  does  not  turn  the  previous  state¬ 
ment  into  a  scholastic  truism. 

Years  of  most  patient  and  skillful  attempts  at  teaching  monkeys  to  use 
language  intelligently  succeeded  in  nothing  better  than  making  them  use  four 
single  words  with  understanding,  and  monkeys'  brains  are  in  many  respects 
vastly  superior  to  those  of  computers.  True  enough,  computers  can  do  many 
things  better  than  monkeys  or  humans,  computing  for  instance,  but  then  we 
know  the  corresponding  algorithms,  and  know  how  to  feed  them  into  the  com¬ 
puter.  In  some  cases  we  know  algorithms  which,  when  fed  into  the  computer, 
will  enable  it  to  construct  for  itself  computing  algorithms  out  of  other 
data  and  instructions  that  can  be  fed  into  it.  But  nothing  of  the  kind  is 
known  with  respect  to  linguistic  abilities.  So  long  as  we  are  unable  to 
wire  or  program  computers  so  that  their  initial  state  will  be  similar  to 
that  of  a  newborn  human  infant,  physically  or  at  least  functionally,  let's 
forget  about  teaching  computers  to  construct  grammars. 

Let  me  now  turn  to  the  first  two  items.  What  is  the  outlook  for  compu¬ 
ters  to  master  a  natural  language  to  approximately  the  same  degree  as  does  a 
native  speaker  of  such  a  language?  And  by  "mastering  a  language"  I  now  mean, 
of  course,  only  a  mastery  of  its  grammar,  i.e.  vocabulary,  morphology,  and 
syntax,  to  the  exclusion  of  its  semantics  and  pragmatics.  Until  recently, 

I  think  that  most  of  us  who  dealt  with  WT  at  one  time  or  another  believed 
that  not  only  was  this  aim  attainable,  but  that  it  would  not  be  so  very  dif¬ 
ficult  to  attain  it,  for  the  practical  purpose  at  hand.  One  realized  that 
the  mechanization  of  syntactic  analysis,  based  on  this  mastery,  would  lead 
on  occasion  to  multiple  analyses  whose  final  reduction  to  a  unique  analysis 
would  then  be  relegated  to  the  limbo  of  semantics,  but  did  not  tend  to  take 
this  drawback  very  seriously.  It  seems  that  here,  too,  a  more  sober  appraisal 


-6- 


of  the  situation  is  indicated  and  already  is  gaining  ground,  if  I  am  not 
mistaken.  More  and  more  people  have  become  convinced  that  the  inadequacies 
of  present  methods  of  mechanical  determination  of  syntactic  structure,  in 
comparison  with  what  competent  and  linguistioally  trained  native  speakers 
are  able  to  do,  are  not  only  due  to  the  fact  that  we  don't  know  as  yet 
enough  about  the  semantics  of  our  language  —  though  this  is  surely  true 
enough  —  but  also  to  the  perhaps  not  too  surprising  fact  that  the  grammars 
which  were  in  the  back  of  the  minds  of  almost  all  NT  people  were  of  too 
simple  a  type,  namely  of  the  so-called  immediate  constituent  type,  though 
it  is  quite  amazing  to  see  how  many  variants  of  this  type  came  up  in  this 
connection. 

Leaving  aside  the  question  of  the  theoretical  inadequacy  of  immediate 
constituent  grammars  for  natural  languages,  the  following  fact  has  come  to 
the  fore  during  the  last  few  years:  If  one  wants  to  increase  the  degree  of 
approximate  practical  adequacy  of  such  grammars,  one  has  to  pay  an  enormous 
price  for  this,  namely  a  proliferation  of  rules  (partly,  but  not  wholly, 
caused  by  a  proliferation  of  syntactic  categories)  of  truly  astronomic 
nature.  The  dialectics  of  the  situation  is  distressing:  the  better  the 
understanding  of  linguistic  structure,  and  greater  our  mastery  of  the  lan¬ 
guage  —  the  larger  the  set  of  grammatical  rules  we  need  to  describe  the 
language,  the  heavier  the  preparatory  work  of  writing  the  grammar,  and  the 
costlier  the  machine  operations  of  storing  and  working  with  such  a  grammar. 

It  is  very  often  said  that  our  present  computers  are  already  good  enough 
for  the  task  of  MT  and  will  be  more  than  sufficient  in  their  next  generation, 
but  that  the  bottleneck  lies  mostly  in  our  insufficient  understanding  of  the 
workings  of  language.  As  soon  as  we  know  all  of  it,  the  problem  will  be 
licked.  I  shall  not  discuss  here  the  extremely  dubious  charao+er  of  this 
"knowing  all  of  it,"  but  only  point  out  that  the  more  we  shall  know  about 
linguistic  structure,  the  more  complex  the  description  of  this  structure  will 
become,  so  long  as  we  stick  to  immediate  constituent  grammars.  It  is  known 
that  in  some  cases  transformational  grammars  are  able  to  reduce  the  com¬ 
plexity  of  the  description  by  orders  of  magnitude.  Whether  this  holds  in 


-7- 


general  remains  to  be  seen,  but  the  time  has  come  for  those  Interested  in 
the  mechanical  determination  of  syntactic  structure,,  whether  for  its  own 
sake,  for  NT  or  for  other  applications,  to  get  out  of  the  self-imposed 
straitjacket  of  immediate  constituent  grammars  and  start  working  with 
more  powerful  models,  suoh  as  transformational  grammars. 

Let  me  illustrate  by  just  one  example:  one  of  the  best  programs  in 
existence,  on  one  of  the  best  computers  in  existence,  recently  needed  12 
minutes  (and  something  like  1100  on  a  commercial  basis)  to  provide  an  ex¬ 
haustive  syntactic  analysis  of  a  35-word  sentence  [47].  I  understand  that 
the  program  has  been  improved  in  the  meantime  and  that  the  time  required 
for  such  an  analysis  is  now  closer  to  one  minute.  However,  the  output 
of  this  analysis  is  multiple,  leaving  the  selection  of  the  single  analysis, 
which  is  correct  in  accordance  with  context  and  background,  to  other  parts 
of  the  program  or  to  the  human  posteditor.  But  there  are  other  troubles  with 
using  Immediate  constituent  grammars  only  for  NT  purposes.  In  his  lecture 
to  this  Institute,  Hr.  Gross  gave  an  example  of  a  French  sentence  in  the 
passive  mood  which  could  be  translated  into  English  only  by  ad  hoc  procedures 
so  long  as  its  syntactic  analysis  is  made  on  an  immediate  constituent  basis 
only.  The  translation  into  English  is  straightforward  as  soon  as  the  French 
sentence  is  first  detransformed  into  the  active  mood.  A  grammar  which  is 
unable  to  provide  this  conversion,  besides  being  scientifically  unsatisfactory, 
will  increase  the  difficulties  of  NT. 

In  the  time  left  to  me  I  would  like  to  return  to  what  is  perhape  the 
most  widespread  fallacy  connected  with  NT,  the  fallacy  I  call,  in  variation 
of  a  well  known  term  of  Whitehead,  The  Fallacy  of  Misplaced  Economy.  I  refer 
to  the  idea  that  indirect  machine  translation  through  an  intermediate  language 
will  result  in  considerable  to  vast  economies  over  direct  translation  from 
source  to  target  language,  on  the  obvious  condition  that  should  NT  turn  out 
to  be  feasible  at  all,  in  some  sense  or  other,  many  opportunities  for  simul¬ 
taneous  translation  from  one  source  language  into  many  target  languages  (and 
vice  versa)  will  arise.  I  already  once  before  discussed  both  the  attractive¬ 
ness  of  this  idea  and  the  fallaciousness  of  the  reasoning  behind  it.  Let 


M  therefore  discuss  hare  at  soma  length  only  what  I  regard  to  be  the 
kernel  of  the  fallacy. 

The  following  argument  has  great  appeal:  Assume  that  we  deal 

with  10  languages,  and  that  we  are  interested  in  translating  from  each  language 
into  every  other,  i.e.,  altogether  90  translation  pairs.  Assume,  for  simpli¬ 
city's  sake,  that  each  translation  algorithm  —  never  mind  the  quality  of  the 
output  —  requires  100  man-years.  Then  the  preparation  of  all  the  algorithms 
will  require  9000  man-years.  If  one  now  designates  one  of  these  languages  as 
the  pivot -language ,  then  only  18  translation  pairs  will  be  needed,  requiring 
1800  man-years  of  preparation,  an  enormous  saving.  True  enough,  translation 
time  for  any  of  the  remaining  72  language  pairs  will  be  approximately  doubled, 
and  the  quality  of  the  output  will  be  somewhat  reduced,  but  this  would  be  a 
price  worth  paying,  (in  general,  the  argument  is  presented  with  some  artifi¬ 
cial  language  serving  as  the  pivot.  Though  this  move  changes  the  appeal  of 
the  argument  for  the  better  —  ainoe  this  artificial  pivot  language  is  sup¬ 
posed  to  be  equipped  with  certain  magical  qualities  —  as  well  as  for  the 
worse  —  since  the  number  of  translation  algorithms  now  increases  to  20—1  don't 
think  that  thereby  the  substance  of  the  following  counterargument  is  weakened.) 
However,  in  order  to  counteract  even  this  deterioration,  let  us  double  our  ef¬ 
fort  and  spend,  say,  200  man-years  on  the  preparation  of  the  algorithms  for 
translating  to  and  from  the  pivot  language.  Ve  would  still  wind  up  with  no 
more  than  5600  man-years  of  work,  vs.  the  9000  originally  needed.  Well? 

The  fallacy,  so  it  seems  to  me,  lies  in  the  following:  the  argument  would 
hold  if  the  preparation  of  the  90  algorithms  were  to  be  done  independently  and 
simultaneously  by  different  people,  with  nobody  learning  from  the  experience 
of  his  co-workers.  This  is  surely  a  highly  unrealistic  assumption.  If  pre¬ 
paring  the  Russian-to-&nglish  and  German- to-lnglish  algorithms  were  to  take 
100  man-years  each,  when  done  this  way,  there  can  be  no  doubt  that  preparing 
the  German- to-lnglish  algorithm  after  completion  (or  even  partial  completion) 
of  a  successful  Russian-to-Bnglish  algorithm  will  take  much  less  time,  perhaps 
half  as  much.  The  next  pair,  say  Japanese-to-Bnglish,  will  take  still  less 
time,  etc.  All  these  figures  being  utterly  arbitrary,  I  don't  think  we  should 


go  on  bothering  about  the  convergence  of  this  series.  Though  we  might  still 
wind  up  with  a  larger  time  needed  for  the  preparation  of  the  90  than  of  the 
18  "double  precision"  algorithms,  it  is  doubtful,  to  say  the  least,  whether 
the  overall  quality/preparation-time/tranalation-time  balance  would  be  in 
favor  of  the  pivot  language  approach. 

Add  to  this  the  fact  that  100  man-years  would  be  enough,  by  assumption, 
to  start  a  working  MT  outfit  along  the  direct  approach,  whereas  400  man-years 
will  be  needed  even  to  start  translating  the  first  pair  along  the  indirect  ap¬ 
proach,  and  the  initial  appeal  of  the  intermediate  language  idea  should  com¬ 
pletely  vanish,  when  judged  from  a  practical  point  of  view.  As  to  its  specu¬ 
lative  impact,  enough  has  been  said  on  other  occasions. 

I  think  it  is  ay  duty  to  state  at  the  end  of  this  lecture  series  where 
all  this  leaves  us.  Autonomous,  high-quality  machine  translation  between 
natural  languages  according  to  rigid  algorithms  may  safely  be  considered  as 
dead.  Such  translation  on  the  basis  of  learning  abilities  is  still-born. 

Though  machines  could  doubtless  provide  a  great  variety  of  aids  to  human  trans¬ 
lation,  so  far  in  no  case  has  economic  feasibility  of  any  such  aid  been 
proven,  though  the  outlook  for  the  future  is  not  all  dark.  So  much  for  the 
debit  side.  On  of  credit  side  of  the  past  NT  efforts  stands  the  enormous 
increase  of  interest  which  has  already  begun  to  pay  off  not  only  in  an  in¬ 
creased  understanding  of  language  as  such,  but  also  in  such  applications  as 
the  mechanical  translation  between  programming  languages.  But  this  oould 
already  be  a  topic  for  another  Institute. 


REFERENCES 


[1]  Harris,  Z.  S.  Methods  io  structural  linguistics.  Chicago,  Oniv.  of  Chicago 
Press,  1951. 

[2]  Hockett,  C.  F.  Tvo  models  of  grammatical  description.  Word,  vol.  10  (1954), 
210-231  (reprinted  as  Ch.  39  in  Readings  in  linguistics  (M.  Joos,  ed.), 
Washington  D.  C.,  American  Council  of  Learned  Societies). 

[3]  HJelmslev,  L.  Prolegomena  to  a  theory  of  language  (tr.  by  F.  J.  Whitfield), 
Baltimore,  Waverly  Press,  1953 > 

[4]  Uldall,  H.  Outline  of  gloaeematics.  Copenhagen,  Nordisk  Sprog-  og  Kultur- 
forlag,  1957. 

[5]  Carnap,  R.The  logical  syntax  of  language.  New  York,  Harcourt,  Brace  A  Co., 
1937. 

[6]  Ajdukiewios,  K.  Die  syntaktische  Konnexitat.  Studia  Philoaophica.  vol.  1 
(1935),  1-27. 

[7]  Post,  B.  L.  Formal  reductions  of  the  general  decision  problem.  American 
Journal  of  Mathematics,  vol.  65  (1943),  197-215. 

[8]  Davis,  M.  Computability  and  unsolvabilitv.  New  York,  McGraw-Hill,  1958. 

[9]  Curry,  H.  B.  and  Fey  a,  R.  Combinatory  logic.  Amsterdam,  North-Holland  Pub¬ 
lishing  Co.,  1958. 

[10]  Chomsky,  N.  Syntactic  Structures.  a’Gravenhage,  Nouton  ft  Co.,  1957. 

[11]  Bar-Hillel,  Y.  A  quasi-arithmetical  notation  for  syntactic  description. 
Language,  vol.  29  (1953),  47-58. 

[12]  Lambeck,  J.  The  aathematioe  of  sentence  structure.  American  Mathematical 
Monthly,  vol.  65  (1956),  154-170 

[13]  Lambeck,  J.  Contributions  to  a  mathematical  analysis  of  the  English  verb- 

phnae.  JgUfflftl  9 1  AtawUU9°,  vol.  5  (1959), 

83-89. 

[14]  Lambeck,  J.  On  the  calculus  of  syntactic  types.  Twelfth  Symposium  in  Ap¬ 
plied  Mathematics  (R.  Jakobson,  ed.),  Providence,  R.  1.,  American  Mathe¬ 
matical  Society,  1961. 

[15]  Bar-Hillel  Y.  The  present  status  of  automatic  translation  of  languages, 
Appendix  II,  in  Advances  in  Computers.  Vol.  I  (F.  L.  Alt,  ed.),  New  York, 
Academic  Press,  I960. 

[16]  Bar-Hillel,  Y.,  Qaifman,  C.,  and  Shamir,  E.  On  oategoriml  and  phxase- 
struoture  grammars.  Bulletin  of  the  Research  Council  of  Israel,  vol.  9F 
(i960),  1-16. 

[17]  Rabin,  R.  0.,  and  Scott,  D.  Finite  automata  and  their  decision  problems, 

IBM  Journal  of  Research  and  Development .  vol.  3  (1959),  115-125. 

[is]  Bar-Hillel,  Y. ,  and  Shamir,  E.  Finite-state  languages:  Formal  representa¬ 
tions  and  adequacy  problems.  Bulletin  of  the  Research  Council  of  Israel. 
vol.  8F  (i960),  155-166. 


[19]  Bar-JIillel,  Y.f  Perlea,  M„ ,  and  Shamir,  S.  On  formal  properties  of  simple 
phrase-structure  languages.  Zeltsohrift  fur  Phonetik.  Sprachwiseeneohaft 
und  Kosmunlkationsfore chung.  vol.  14  (l96l),  143-172. 

[20]  Post,  E.  A  variant  of  a  recursively  unsolvable  problem.  Bulletin  of  the 
Aaerioan  Mathematical  Society,  vol.  52  (1946),  264-268. 

[21]  Kleene,  S.  C.  Representation  of  events  in  nerve  nets  and  finite  automata. 
Automata  Studies  (C.  E.  Shannon  and  J .  McCarthy,  eds.),  Princeton  Univ. 
Press,  1936. 

[22]  Gina burg,  S.,  and  Rice,  H.  0.  Two  families  of  languages  related  to  ALGOL. 

TM-578/OOO/Ol,  SDC,  Santa  Monica,  Cal.,  July  1961. 

[23]  Ginsburg,  S.,  and  Rose,  G.  F.  Operations  which  preserve  definability  in 
languages.  SP-511,  SDC,  Santa  Monica,  Cal.,  October  1961. 

[24]  Shamir,  5.  On  sequential  languages.  Applied  Logic  Branch,  Technical  Report 
No.  7  (prepared  for  the  Office  of  Naval  Research,  Information  Systems 
Branch),  Hebrew  University,  Jerusalem,  Israel,  November  1961. 

[25]  Hays,  D.  G.  Grouping  and  dependency  theories.  P-1910,  Rand  Corporation, 
Santa  Monica,  Cal.,  I960. 

[26]  Lecerf ,  Y.  and  Ita,  P.  Elements  pour  une  gramaaire  generals  des  langues 
projectives.  Rapport  GRISA,  No.  1,  11-29,  I960. 

[27]  Tesniere,  L.  Elwwnta  dm  eyntaaa  atructurale.  Paris,  Klinckaieck,  1959. 

[28]  Gaifman,  C.  Dependency  systems  and  phrase-structure  systems .  P-2315, 

Rand  Corporation,  Santa  Monica,  Cal. ,  1961. 

[29]  Chomsky,  N.  Formal  properties  of  grammars.  Handbook  of  Mathematical 
Psychology  (in  press). 

[30]  Yngve,  V.  H.  A  model  and  an  hypothesis  for  language  structure.  Proceedings 
of  the  American  Philosophical  Society,  vol.  104  (i960),  444-466. 

[31]  Yngve,  V.  H.  The  depth  hypothesis.  Twelfth  Symposium  in  Applied  Mathematics 
(R.  Jakobson,  ed .),  Providence ,  R.  I.,  American  Mathematical  Society,  1961. 

[32}  Mjrhill,  J.  Linear  bounded  automata.  Wright  Air  Development  Division, 
Technical  Note  60-165,  I960. 

[33]  McNaughton  R.  The  theory  of  automata,  a  survey.  Advances  in  Computers. 

Vol.  II  (Franz  L.  Alt,  ed.),  New  York,  Academic  Press,  1962. 

[34l  Flesch,  R.  A  new  readability  yardstick.  Journal  of  Applied  Psychology, 
vol.  32  (1948),  221-233. 

[35]  Chomsky,  N.  and  Miller,  C.  A.  Fi nitary  models  of  language  users.  Hand¬ 
book  of  Mathematical  Psychology  (in  press). 

[36]  Chomsky,  N.  On  certain  formal  properties  of  grammars,  Information  and 
Control,  vol.  2  (1953),  133-167. 

[37}  Chomskv,  N.  On  the  notion  "rule  of  grammar",  Twelfth  Symposium  in  Applied 
Mathematics  (r.  Jakobson,  ed, ) .Providence,  R.  I.,  American  Mathematical 

Society,  1961, 


[38]  Chomsky,  N.  Explanatory  modal*  in  linguistics.  Logic.  Methodology  and 

FMicacnhv  q£  gf  *hf  I960  International  Congrass 

(E.  Nagel,  P.  Suppee,  and  A.  Tarski,  ads.),  Stanford,  Cal.,  Stanford 
University  Press,  1962. 

[39]  Church.  ^Introduction  to  mathematical  logic,  rol.  I.  Princeton  Uni  vert,  i^ 

[40]  Bar-Hillel,  T.  Recursive  definitions  in  empirical  sciences.  Proceedings 
fif-th?  Eleventh  Interne tionml  Congress  of  Philosophy,  vol.  5  (1953), 

Brussels,  160-165. 

[41]  Bar-Hillel,  T.  Three  remarks  on  linguistics  fundamentals.  Word,  vol.  13 
(1957),  323-335. 

[42]  Quine,  W.  T.  Proa  a  lo^ggl  pgjnt  of  visit.  Harvard  University  Prase,  1953. 

[43]  Ziff,  P.  Swmi^r  analysis. Ithaca.  Mew  York,  Cornell  Univ.  Press,  I960. 

[44]  Carnap,  R.  Logical  fomlitlfH  fl£  Ptobsbilltv.  University  of  Chicago  Press,  1950. 

[45]  Bar-Hillel,  Y.  The  future  of  machine  translation.  Freeing  the  Wind. 

Articles  and  Letters  from  Times  Literary  Supplement  during  Naroh-June 
1962.  32-37. 

[46]  fsrv  (V.  i-.  rs:  **  (emt’n 

[47]  Kuno,  S.  and  Oettinger,  A.  0.  Multiple- path  syntactic  analyser.  Proceeding 
Qt  the  IFIP  Congress  1962  (in  press). 


